Dementia diagnosis requires a series of different testing methods, which is
complex and time-consuming. Early detection of dementia is crucial as it can
prevent further deterioration of the condition. This paper utilizes a speech
recognition model to construct a dementia assessment system tailored for
Mandarin speakers during the picture description task. By training an
attention-based speech recognition model on voice data closely resembling
real-world scenarios, we have significantly enhanced the model's recognition
capabilities. Subsequently, we extracted the encoder from the speech
recognition model and added a linear layer for dementia assessment. We
collected Mandarin speech data from 99 subjects and acquired their clinical
assessments from a local hospital. We achieved an accuracy of 92.04% in
Alzheimer's disease detection and a mean absolute error of 9% in clinical
dementia rating score prediction.
( 2
min )
One of the challenges in deploying a machine learning model is that the
model's performance degrades as the operating environment changes. To maintain
the performance, streaming active learning is used, in which the model is
retrained by adding a newly annotated sample to the training dataset if the
prediction of the sample is not certain enough. Although many streaming active
learning methods have been proposed for classification, few efforts have been
made for regression problems, which are often handled in the industrial field.
In this paper, we propose to use the regression-via-classification framework
for streaming active learning for regression. Regression-via-classification
transforms regression problems into classification problems so that streaming
active learning methods proposed for classification problems can be applied
directly to regression problems. Experimental validation on four real data sets
shows that the proposed method can perform regression with higher accuracy at
the same annotation cost.
( 2
min )
A common approach to learning mobile health (mHealth) intervention policies
is linear Thompson sampling. Two desirable mHealth policy features are (1)
pooling information across individuals and time and (2) incorporating a
time-varying baseline reward. Previous approaches pooled information across
individuals but not time, failing to capture trends in treatment effects over
time. In addition, these approaches did not explicitly model the baseline
reward, which limited the ability to precisely estimate the parameters in the
differential reward model. In this paper, we propose a novel Thompson sampling
algorithm, termed ''DML-TS-NNR'' that leverages (1) nearest-neighbors to
efficiently pool information on the differential reward function across users
and time and (2) the Double Machine Learning (DML) framework to explicitly
model baseline rewards and stay agnostic to the supervised learning algorithms
used. By explicitly modeling baseline rewards, we obtain smaller confidence
sets for the differential reward parameters. We offer theoretical guarantees on
the pseudo-regret, which are supported by empirical results. Importantly, the
DML-TS-NNR algorithm demonstrates robustness to potential misspecifications in
the baseline reward model.
( 2
min )
The recognition of abstracts is crucial for effectively locating the content
and clarifying the article. Existing move recognition algorithms lack the
ability to learn word position information to obtain contextual semantics. This
paper proposes a novel enhanced move recognition algorithm with an improved
pre-trained model and a gated network with attention mechanism for unstructured
abstracts of Chinese scientific and technological papers. The proposed
algorithm first performs summary data segmentation and vocabulary training. The
EP-ERNIE$\_$AT-GRU framework is leveraged to incorporate word positional
information, facilitating deep semantic learning and targeted feature
extraction. Experimental results demonstrate that the proposed algorithm
achieves 13.37$\%$ higher accuracy on the split dataset than on the original
dataset and a 7.55$\%$ improvement in accuracy over the basic comparison model.
( 2
min )
While federated learning is promising for privacy-preserving collaborative
learning without revealing local data, it remains vulnerable to white-box
attacks and struggles to adapt to heterogeneous clients. Federated distillation
(FD), built upon knowledge distillation--an effective technique for
transferring knowledge from a teacher model to student models--emerges as an
alternative paradigm, which provides enhanced privacy guarantees and addresses
model heterogeneity. Nevertheless, challenges arise due to variations in local
data distributions and the absence of a well-trained teacher model, which leads
to misleading and ambiguous knowledge sharing that significantly degrades model
performance. To address these issues, this paper proposes a selective knowledge
sharing mechanism for FD, termed Selective-FD. It includes client-side
selectors and a server-side selector to accurately and precisely identify
knowledge from local and ensemble predictions, respectively. Empirical studies,
backed by theoretical insights, demonstrate that our approach enhances the
generalization capabilities of the FD framework and consistently outperforms
baseline methods.
( 2
min )
The influx of massive amounts of data from current and upcoming cosmological
surveys necessitates compression schemes that can efficiently summarize the
data with minimal loss of information. We introduce a method that leverages the
paradigm of self-supervised machine learning in a novel manner to construct
representative summaries of massive datasets using simulation-based
augmentations. Deploying the method on hydrodynamical cosmological simulations,
we show that it can deliver highly informative summaries, which can be used for
a variety of downstream tasks, including precise and accurate parameter
inference. We demonstrate how this paradigm can be used to construct summary
representations that are insensitive to prescribed systematic effects, such as
the influence of baryonic physics. Our results indicate that self-supervised
machine learning techniques offer a promising new approach for compression of
cosmological data as well its analysis.
( 2
min )
Many functions characterising physical systems are additively separable. This
is the case, for instance, of mechanical Hamiltonian functions in physics,
population growth equations in biology, and consumer preference and utility
functions in economics. We consider the scenario in which a surrogate of a
function is to be tested for additive separability. The detection that the
surrogate is additively separable can be leveraged to improve further learning.
Hence, it is beneficial to have the ability to test for such separability in
surrogates. The mathematical approach is to test if the mixed partial
derivative of the surrogate is zero; or empirically, lower than a threshold. We
present and comparatively and empirically evaluate the eight methods to compute
the mixed partial derivative of a surrogate function.
( 2
min )
While coresets have been growing in terms of their application, barring few
exceptions, they have mostly been limited to unsupervised settings. We consider
supervised classification problems, and non-decomposable evaluation measures in
such settings. We show that stratified uniform sampling based coresets have
excellent empirical performance that are backed by theoretical guarantees too.
We focus on the F1 score and Matthews Correlation Coefficient, two widely used
non-decomposable objective functions that are nontrivial to optimize for and
show that uniform coresets attain a lower bound for coreset size, and have good
empirical performance, comparable with ``smarter'' coreset construction
strategies.
( 2
min )
Theoretical guarantees in reinforcement learning (RL) are known to suffer
multiplicative blow-up factors with respect to the misspecification error of
function approximation. Yet, the nature of such \emph{approximation factors} --
especially their optimal form in a given learning problem -- is poorly
understood. In this paper we study this question in linear off-policy value
function estimation, where many open questions remain. We study the
approximation factor in a broad spectrum of settings, such as with the weighted
$L_2$-norm (where the weighting is the offline state distribution), the
$L_\infty$ norm, the presence vs. absence of state aliasing, and full vs.
partial coverage of the state space. We establish the optimal asymptotic
approximation factors (up to constants) for all of these settings. In
particular, our bounds identify two instance-dependent factors for the
$L_2(\mu)$ norm and only one for the $L_\infty$ norm, which are shown to
dictate the hardness of off-policy evaluation under misspecification.
( 2
min )
High-resolution image generation with Generative Artificial Intelligence
(GenAI) has immense potential but, due to the enormous capital investment
required for training, it is increasingly centralised to a few large
corporations, and hidden behind paywalls. This paper aims to democratise
high-resolution GenAI by advancing the frontier of high-resolution generation
while remaining accessible to a broad audience. We demonstrate that existing
Latent Diffusion Models (LDMs) possess untapped potential for higher-resolution
image generation. Our novel DemoFusion framework seamlessly extends open-source
GenAI models, employing Progressive Upscaling, Skip Residual, and Dilated
Sampling mechanisms to achieve higher-resolution image generation. The
progressive nature of DemoFusion requires more passes, but the intermediate
results can serve as "previews", facilitating rapid prompt iteration.
( 2
min )
We study monotone submodular maximization under general matroid constraints
in the online setting. We prove that online optimization of a large class of
submodular functions, namely, weighted threshold potential functions, reduces
to online convex optimization (OCO). This is precisely because functions in
this class admit a concave relaxation; as a result, OCO policies, coupled with
an appropriate rounding scheme, can be used to achieve sublinear regret in the
combinatorial setting. We show that our reduction extends to many different
versions of the online learning problem, including the dynamic regret, bandit,
and optimistic-learning settings.
( 2
min )
The aim of this paper is to provide a theoretically founded investigation of
state-of-the-art learning approaches for inverse problems. We give an extended
definition of regularization methods and their convergence in terms of the
underlying data distributions, which paves the way for future theoretical
studies. Based on a simple spectral learning model previously introduced for
supervised learning, we investigate some key properties of different learning
paradigms for inverse problems, which can be formulated independently of
specific architectures. In particular we investigate the regularization
properties, bias, and critical dependence on training data distributions.
Moreover, our framework allows to highlight and compare the specific behavior
of the different paradigms in the infinite-dimensional limit.
( 2
min )
In this work we introduce $\nu^2$-Flows, an extension of the $\nu$-Flows
method to final states containing multiple neutrinos. The architecture can
natively scale for all combinations of object types and multiplicities in the
final state for any desired neutrino multiplicities. In $t\bar{t}$ dilepton
events, the momenta of both neutrinos and correlations between them are
reconstructed more accurately than when using the most popular standard
analytical techniques, and solutions are found for all events. Inference time
is significantly faster than competing methods, and can be reduced further by
evaluating in parallel on graphics processing units. We apply $\nu^2$-Flows to
$t\bar{t}$ dilepton events and show that the per-bin uncertainties in unfolded
distributions is much closer to the limit of performance set by perfect
neutrino reconstruction than standard techniques. For the chosen double
differential observables $\nu^2$-Flows results in improved statistical
precision for each bin by a factor of 1.5 to 2 in comparison to the Neutrino
Weighting method and up to a factor of four in comparison to the Ellipse
approach.
( 3
min )
The community explored to build private inference frameworks for
transformer-based large language models (LLMs) in a server-client setting,
where the server holds the model parameters and the client inputs its private
data (or prompt) for inference. However, these frameworks impose significant
overhead when the private inputs are forward propagated through the original
LLMs. In this paper, we show that substituting the computation- and
communication-heavy operators in the transformer architecture with
privacy-computing friendly approximations can greatly reduce the private
inference costs while incurring very minor impact on model performance.
Compared to state-of-the-art Iron (NeurIPS 2022), our privacy-computing
friendly model inference pipeline achieves a $5\times$ acceleration in
computation and an 80% reduction in communication overhead, while retaining
nearly identical accuracy.
( 2
min )
In the field of clinical medicine, computed tomography (CT) is an effective
medical imaging modality for the diagnosis of various pathologies. Compared
with X-ray images, CT images can provide more information, including
multi-planar slices and three-dimensional structures for clinical diagnosis.
However, CT imaging requires patients to be exposed to large doses of ionizing
radiation for a long time, which may cause irreversible physical harm. In this
paper, we propose an Uncertainty-aware MedNeRF (UMedNeRF) network based on
generated radiation fields. The network can learn a continuous representation
of CT projections from 2D X-ray images by obtaining the internal structure and
depth information and using adaptive loss weights to ensure the quality of the
generated images. Our model is trained on publicly available knee and chest
datasets, and we show the results of CT projection rendering with a single
X-ray and compare our method with other methods based on generated radiation
fields.
( 2
min )
Biomedical entity linking (BioEL) has achieved remarkable progress with the
help of pre-trained language models. However, existing BioEL methods usually
struggle to handle rare and difficult entities due to long-tailed distribution.
To address this limitation, we introduce a new scheme $k$NN-BioEL, which
provides a BioEL model with the ability to reference similar instances from the
entire training corpus as clues for prediction, thus improving the
generalization capabilities. Moreover, we design a contrastive learning
objective with dynamic hard negative sampling (DHNS) that improves the quality
of the retrieved neighbors during inference. Extensive experimental results
show that $k$NN-BioEL outperforms state-of-the-art baselines on several
datasets.
( 2
min )
We present a deep Graph Convolutional Kernel Machine (GCKM) for
semi-supervised node classification in graphs. The method is built of two main
types of blocks: (i) We introduce unsupervised kernel machine layers
propagating the node features in a one-hop neighborhood, using implicit node
feature mappings. (ii) We specify a semi-supervised classification kernel
machine through the lens of the Fenchel-Young inequality. We derive an
effective initialization scheme and efficient end-to-end training algorithm in
the dual variables for the full architecture. The main idea underlying GCKM is
that, because of the unsupervised core, the final model can achieve higher
performance in semi-supervised node classification when few labels are
available for training. Experimental results demonstrate the effectiveness of
the proposed framework.
( 2
min )
Inverse reinforcement learning (IRL) is computationally challenging, with
common approaches requiring the solution of multiple reinforcement learning
(RL) sub-problems. This work motivates the use of potential-based reward
shaping to reduce the computational burden of each RL sub-problem. This work
serves as a proof-of-concept and we hope will inspire future developments
towards computationally efficient IRL.
( 2
min )
The promise of Mobile Health (mHealth) is the ability to use wearable sensors
to monitor participant physiology at high frequencies during daily life to
enable temporally-precise health interventions. However, a major challenge is
frequent missing data. Despite a rich imputation literature, existing
techniques are ineffective for the pulsative signals which comprise many
mHealth applications, and a lack of available datasets has stymied progress. We
address this gap with PulseImpute, the first large-scale pulsative signal
imputation challenge which includes realistic mHealth missingness models, an
extensive set of baselines, and clinically-relevant downstream tasks. Our
baseline models include a novel transformer-based architecture designed to
exploit the structure of pulsative signals. We hope that PulseImpute will
enable the ML community to tackle this significant and challenging task.
( 2
min )
Can a machine or algorithm discover or learn Kepler's first law from
astronomical sightings alone? We emulate Johannes Kepler's discovery of the
equation of the orbit of Mars with the Rudolphine tables using AI Feynman, a
physics-inspired tool for symbolic regression.
( 2
min )
Exact Bayesian inference on state-space models~(SSM) is in general
untractable, and unfortunately, basic Sequential Monte Carlo~(SMC) methods do
not yield correct approximations for complex models. In this paper, we propose
a mixed inference algorithm that computes closed-form solutions using belief
propagation as much as possible, and falls back to sampling-based SMC methods
when exact computations fail. This algorithm thus implements automatic
Rao-Blackwellization and is even exact for Gaussian tree models.
( 2
min )
Policy learning in robot-assisted surgery (RAS) lacks data efficient and
versatile methods that exhibit the desired motion quality for delicate surgical
interventions. To this end, we introduce Movement Primitive Diffusion (MPD), a
novel method for imitation learning (IL) in RAS that focuses on gentle
manipulation of deformable objects. The approach combines the versatility of
diffusion-based imitation learning (DIL) with the high-quality motion
generation capabilities of Probabilistic Dynamic Movement Primitives (ProDMPs).
This combination enables MPD to achieve gentle manipulation of deformable
objects, while maintaining data efficiency critical for RAS applications where
demonstration data is scarce. We evaluate MPD across various simulated tasks
and a real world robotic setup on both state and image observations. MPD
outperforms state-of-the-art DIL methods in success rate, motion quality, and
data efficiency.
( 2
min )
Venn Prediction (VP) is a new machine learning framework for producing
well-calibrated probabilistic predictions. In particular it provides
well-calibrated lower and upper bounds for the conditional probability of an
example belonging to each possible class of the problem at hand. This paper
proposes five VP methods based on Neural Networks (NNs), which is one of the
most widely used machine learning techniques. The proposed methods are
evaluated experimentally on four benchmark datasets and the obtained results
demonstrate the empirical well-calibratedness of their outputs and their
superiority over the outputs of the traditional NN classifier.
( 2
min )
Artificial Intelligence (AI) based image analysis has an immense potential to
support diagnostic histopathology, including cancer diagnostics. However,
developing supervised AI methods requires large-scale annotated datasets. A
potentially powerful solution is to augment training data with synthetic data.
Latent diffusion models, which can generate high-quality, diverse synthetic
images, are promising. However, the most common implementations rely on
detailed textual descriptions, which are not generally available in this
domain. This work proposes a method that constructs structured textual prompts
from automatically extracted image features. We experiment with the PCam
dataset, composed of tissue patches only loosely annotated as healthy or
cancerous. We show that including image-derived features in the prompt, as
opposed to only healthy and cancerous labels, improves the Fr\'echet Inception
Distance (FID) from 178.8 to 90.2. We also show that pathologists find it
challenging to detect synthetic images, with a median sensitivity/specificity
of 0.55/0.55. Finally, we show that synthetic data effectively trains AI
models.
( 3
min )
Offline reinforcement learning leverages pre-collected datasets of
transitions to train policies. It can serve as effective initialization for
online algorithms, enhancing sample efficiency and speeding up convergence.
However, when such datasets are limited in size and quality, offline
pre-training can produce sub-optimal policies and lead to degraded online
reinforcement learning performance. In this paper we propose a model-based data
augmentation strategy to maximize the benefits of offline reinforcement
learning pre-training and reduce the scale of data needed to be effective. Our
approach leverages a world model of the environment trained on the offline
dataset to augment states during offline pre-training. We evaluate our approach
on a variety of MuJoCo robotic tasks and our results show it can jump-start
online fine-tuning and substantially reduce - in some cases by an order of
magnitude - the required number of environment interactions.
( 2
min )
This paper studies the problem of CPRP, concept prerequisite relation
prediction, which is a fundamental task in using AI for education. CPRP is
usually formulated into a link-prediction task on a relationship graph of
concepts and solved by training the graph neural network (GNN) model. However,
current directed GNNs fail to manage graph isomorphism which refers to the
invariance of non-isomorphic graphs, reducing the expressivity of resulting
representations. We present a permutation-equivariant directed GNN model by
introducing the Weisfeiler-Lehman test into directed GNN learning. Our method
is then used for CPRP and evaluated on three public datasets. The experimental
results show that our model delivers better prediction performance than the
state-of-the-art methods.
( 2
min )
In this paper we propose a new method for training neural networks (NNs) for
frequency modulated continuous wave (FMCW) radar mutual interference
mitigation. Instead of training NNs to regress from interfered to clean radar
signals as in previous work, we train NNs directly on object detection maps. We
do so by performing a continuous relaxation of the cell-averaging constant
false alarm rate (CA-CFAR) peak detector, which is a well-established algorithm
for object detection using radar. With this new training objective we are able
to increase object detection performance by a large margin. Furthermore, we
introduce separable convolution kernels to strongly reduce the number of
parameters and computational complexity of convolutional NN architectures for
radar applications. We validate our contributions with experiments on
real-world measurement data and compare them against signal processing
interference mitigation methods.
( 2
min )
This paper presents a method for learning Hamiltonian dynamics from a limited
set of data points. The Hamiltonian vector field is found by regularized
optimization over a reproducing kernel Hilbert space of vector fields that are
inherently Hamiltonian, and where the vector field is required to be odd or
even. This is done with a symplectic kernel, and it is shown how this
symplectic kernel can be modified to be odd or even. The performance of the
method is validated in simulations for two Hamiltonian systems. It is shown
that the learned dynamics are Hamiltonian, and that the learned Hamiltonian
vector field can be prescribed to be odd or even.
( 2
min )
Congenital heart disease (CHD) is a relatively rare disease that affects
patients at birth and results in extremely heterogeneous anatomical and
functional defects. 12-lead ECG signal is routinely collected in CHD patients
because it provides significant biomarkers for disease prognosis. However,
developing accurate machine learning models is challenging due to the lack of
large available datasets. Here, we suggest exploiting the Riemannian geometry
of the spatial covariance structure of the ECG signal to improve
classification. Firstly, we use covariance augmentation to mix samples across
the Riemannian geodesic between corresponding classes. Secondly, we suggest to
project the covariance matrices to their respective class Riemannian mean to
enhance the quality of feature extraction via tangent space projection. We
perform several ablation experiments and demonstrate significant improvement
compared to traditional machine learning models and deep learning on ECG time
series data.
( 2
min )
Despite being a unique source of information on patients' status and disease
progression, clinical notes are characterized by high levels of duplication and
information redundancy. In general domain text, it has been shown that
deduplication does not harm language model (LM) pretraining, thus helping
reduce the training cost. Although large LMs have proven to learn medical
knowledge, they still require specialized domain adaptation for improved
downstream clinical tasks. By leveraging large real-world clinical corpora, we
first provided a fine-grained characterization of duplicates stemming from
common writing practices and clinical relevancy. Second, we demonstrated that
deduplicating clinical text can help clinical LMs encode less redundant
information in a more efficient manner and do not harm classification tasks via
prompt-based learning.
( 2
min )
Binary code summarization, while invaluable for understanding code semantics,
is challenging due to its labor-intensive nature. This study delves into the
potential of large language models (LLMs) for binary code comprehension. To
this end, we present BinSum, a comprehensive benchmark and dataset of over 557K
binary functions and introduce a novel method for prompt synthesis and
optimization. To more accurately gauge LLM performance, we also propose a new
semantic similarity metric that surpasses traditional exact-match approaches.
Our extensive evaluation of prominent LLMs, including ChatGPT, GPT-4, Llama 2,
and Code Llama, reveals 10 pivotal insights. This evaluation generates 4
billion inference tokens, incurred a total expense of 11,418 US dollars and 873
NVIDIA A100 GPU hours. Our findings highlight both the transformative potential
of LLMs in this field and the challenges yet to be overcome.
( 2
min )
Despite the remarkable advances in deep learning technology, achieving
satisfactory performance in lung sound classification remains a challenge due
to the scarcity of available data. Moreover, the respiratory sound samples are
collected from a variety of electronic stethoscopes, which could potentially
introduce biases into the trained models. When a significant distribution shift
occurs within the test dataset or in a practical scenario, it can substantially
decrease the performance. To tackle this issue, we introduce cross-domain
adaptation techniques, which transfer the knowledge from a source domain to a
distinct target domain. In particular, by considering different stethoscope
types as individual domains, we propose a novel stethoscope-guided supervised
contrastive learning approach. This method can mitigate any domain-related
disparities and thus enables the model to distinguish respiratory sounds of the
recording variation of the stethoscope. The experimental results on the ICBHI
dataset demonstrate that the proposed methods are effective in reducing the
domain dependency and achieving the ICBHI Score of 61.71%, which is a
significant improvement of 2.16% over the baseline.
( 2
min )
Our study focuses on the potential for modifications of Inception-like
architecture within the electrocardiogram (ECG) domain. To this end, we
introduce IncepSE, a novel network characterized by strategic architectural
incorporation that leverages the strengths of both InceptionTime and channel
attention mechanisms. Furthermore, we propose a training setup that employs
stabilization techniques that are aimed at tackling the formidable challenges
of severe imbalance dataset PTB-XL and gradient corruption. By this means, we
manage to set a new height for deep learning model in a supervised learning
manner across the majority of tasks. Our model consistently surpasses
InceptionTime by substantial margins compared to other state-of-the-arts in
this domain, noticeably 0.013 AUROC score improvement in the "all" task, while
also mitigating the inherent dataset fluctuations during training.
( 2
min )
$B_1^+$ and $B_0$ field-inhomogeneities can significantly reduce accuracy and
robustness of MRF's quantitative parameter estimates. Additional $B_1^+$ and
$B_0$ calibration scans can mitigate this but add scan time and cannot be
applied retrospectively to previously collected data. Here, we proposed a
calibration-free sequence-adaptive deep-learning framework, to estimate and
correct for $B_1^+$ and $B_0$ effects of any MRF sequence. We demonstrate its
capability on arbitrary MRF sequences at 3T, where no training data were
previously obtained. Such approach can be applied to any previously-acquired
and future MRF-scans. The flexibility in directly applying this framework to
other quantitative sequences is also highlighted.
( 2
min )
Uncertainty Quantification (UQ) has gained traction in an attempt to fix the
black-box nature of Deep Learning. Specifically (medical) biosignals such as
electroencephalography (EEG), electrocardiography (ECG), electroocculography
(EOG) and electromyography (EMG) could benefit from good UQ, since these suffer
from a poor signal to noise ratio, and good human interpretability is pivotal
for medical applications and Brain Computer Interfaces. In this paper, we
review the state of the art at the intersection of Uncertainty Quantification
and Biosignal with Machine Learning. We present various methods, shortcomings,
uncertainty measures and theoretical frameworks that currently exist in this
application domain. Overall it can be concluded that promising UQ methods are
available, but that research is needed on how people and systems may interact
with an uncertainty model in a (clinical) environment.
( 2
min )
In this study, we propose an approach for predicting rare events by
exploiting time series in coevolution. Our approach involves a weighted
autologistic regression model, where we leverage the temporal behavior of the
data to enhance predictive capabilities. By addressing the issue of imbalanced
datasets, we establish constraints leading to weight estimation and to improved
performance. Evaluation on synthetic and real-world datasets confirms that our
approach outperform state-of-the-art of predicting home equipment failure
methods.
( 2
min )
This study introduces an innovative 3D printed dry electrode tailored for
biosensing in postoperative recovery scenarios. Fabricated through a drop
coating process, the electrode incorporates a novel 2D material.
( 2
min )
Biased enhanced sampling methods utilizing collective variables (CVs) are
powerful tools for sampling conformational ensembles. Due to high intrinsic
dimensions, efficiently generating conformational ensembles for complex systems
requires enhanced sampling on high-dimensional free energy surfaces. While
methods like temperature-accelerated molecular dynamics (TAMD) can adopt many
CVs in a simulation, unbiasing the simulation requires accurate modeling of a
high-dimensional CV probability distribution, which is challenging for
traditional density estimation techniques. Here we propose an unbiasing method
based on the score-based diffusion model, a deep generative learning method
that excels in density estimation across complex data landscapes. We test the
score-based diffusion unbiasing method on TAMD simulations. The results
demonstrate that this unbiasing approach significantly outperforms traditional
unbiasing methods, and can generate accurate unbiased conformational ensembles
for simulations with a number of CVs higher than usual ranges.
( 2
min )
Catastrophic forgetting(CF) is a significant challenge in continual learning
(CL). In regularization-based approaches to mitigate CF, modifications to
important training parameters are penalized in subsequent tasks using an
appropriate loss function. We propose the RTRA, a modification to the widely
used Elastic Weight Consolidation (EWC) regularization scheme, using the
Natural Gradient for loss function optimization. Our approach improves the
training of regularization-based methods without sacrificing test-data
performance. We compare the proposed RTRA approach against EWC using the
iFood251 dataset. We show that RTRA has a clear edge over the state-of-the-art
approaches.
( 2
min )
Rehearsal-based techniques are commonly used to mitigate catastrophic
forgetting (CF) in Incremental learning (IL). The quality of the exemplars
selected is important for this purpose and most methods do not ensure the
appropriate diversity of the selected exemplars. We propose a new technique
"DSS" -- Diverse Selection of Samples from the input data stream in the
Class-incremental learning (CIL) setup under both disjoint and fuzzy task
boundary scenarios. Our method outperforms state-of-the-art methods and is much
simpler to understand and implement.
( 2
min )
We propose a novel exemplar selection approach based on Principal Component
Analysis (PCA) and median sampling, and a neural network training regime in the
setting of class-incremental learning. This approach avoids the pitfalls due to
outliers in the data and is both simple to implement and use across various
incremental machine learning models. It also has independent usage as a
sampling algorithm. We achieve better performance compared to state-of-the-art
methods.
( 2
min )
The goal of this series is to chronicle opinions and issues in the field of
machine learning as they stand today and as they change over time. The plan is
to host this survey periodically until the AI singularity
paperclip-frenzy-driven doomsday, keeping an updated list of topical questions
and interviewing new community members for each edition. In this issue, we
probed people's opinions on interpretable AI, the value of benchmarking in
modern NLP, the state of progress towards understanding deep learning, and the
future of academia.
( 2
min )
In this survey, we examine algorithms for conducting credit assignment in
artificial neural networks that are inspired or motivated by neurobiology,
unifying these various processes under one possible taxonomy. Our proposed
taxonomy is constructed based on how a learning algorithm answers a central
question underpinning the mechanisms of synaptic plasticity in complex adaptive
neuronal systems: where do the signals that drive the learning in individual
elements of a network come from and how are they produced? In this unified
treatment, we organize the ever-growing set of brain-inspired learning
processes into six general families and consider these in the context of
backpropagation of errors and its known criticisms. The results of this review
are meant to encourage future developments in neuro-mimetic systems and their
constituent learning processes, wherein lies the opportunity to build a strong
bridge between machine learning, computational neuroscience, and cognitive
science.
( 2
min )
In this paper we consider the adversarial contextual bandit problem in metric
spaces. The paper "Nearest neighbour with bandit feedback" tackled this problem
but when there are many contexts near the decision boundary of the comparator
policy it suffers from a high regret. In this paper we eradicate this problem,
designing an algorithm in which we can hold out any set of contexts when
computing our regret term. Our algorithm builds on that of "Nearest neighbour
with bandit feedback" and hence inherits its extreme computational efficiency.
( 2
min )
Theoretical guarantees in reinforcement learning (RL) are known to suffer
multiplicative blow-up factors with respect to the misspecification error of
function approximation. Yet, the nature of such \emph{approximation factors} --
especially their optimal form in a given learning problem -- is poorly
understood. In this paper we study this question in linear off-policy value
function estimation, where many open questions remain. We study the
approximation factor in a broad spectrum of settings, such as with the weighted
$L_2$-norm (where the weighting is the offline state distribution), the
$L_\infty$ norm, the presence vs. absence of state aliasing, and full vs.
partial coverage of the state space. We establish the optimal asymptotic
approximation factors (up to constants) for all of these settings. In
particular, our bounds identify two instance-dependent factors for the
$L_2(\mu)$ norm and only one for the $L_\infty$ norm, which are shown to
dictate the hardness of off-policy evaluation under misspecification.
( 2
min )
Inverse reinforcement learning (IRL) is computationally challenging, with
common approaches requiring the solution of multiple reinforcement learning
(RL) sub-problems. This work motivates the use of potential-based reward
shaping to reduce the computational burden of each RL sub-problem. This work
serves as a proof-of-concept and we hope will inspire future developments
towards computationally efficient IRL.
( 2
min )
In this paper we consider the adversarial contextual bandit problem in metric
spaces. The paper "Nearest neighbour with bandit feedback" tackled this problem
but when there are many contexts near the decision boundary of the comparator
policy it suffers from a high regret. In this paper we eradicate this problem,
designing an algorithm in which we can hold out any set of contexts when
computing our regret term. Our algorithm builds on that of "Nearest neighbour
with bandit feedback" and hence inherits its extreme computational efficiency.
( 2
min )
There have been claims that artificial intelligence is bringing about increased productivity, accuracy, and a smarter workplace. In all of this excitement, it is difficult to differentiate between fact and fantasy. When it comes to the management of workforces, what is the truth there? Within the context of real-world applications, how much hype is there?… Read More »How can data science and AI help HR in workforce development, evaluation, and retention?
The post How can data science and AI help HR in workforce development, evaluation, and retention? appeared first on Data Science Central.
( 29
min )
Artificial intelligence (AI) is one of the most transformational technologies of our generation and provides opportunities to be a force for good and drive economic growth. The growth of large language models (LLMs), with hundreds of billions of parameters, has unlocked new generative AI use cases to improve customer experiences, boost employee productivity, and so […]
( 4
min )
This is a guest post co-written with Babu Srinivasan from MongoDB. As industries evolve in today’s fast-paced business landscape, the inability to have real-time forecasts poses significant challenges for industries heavily reliant on accurate and timely insights. The absence of real-time forecasts in various industries presents pressing business challenges that can significantly impact decision-making and […]
( 8
min )
In this episode of “AI Frontiers,” AI4Science Director Chris Bishop talks about the state of deep learning; his new textbook, “Deep Learning: Foundations and Concepts,” and the impact the field is having on the natural sciences.
The post AI Frontiers: A deep dive into deep learning with Ashley Llorens and Chris Bishop appeared first on Microsoft Research.
( 24
min )
Bilevel optimization has received more and more attention recently due to its
wide applications in machine learning. In this paper, we consider bilevel
optimization in decentralized networks. In particular, we propose a novel
single-loop algorithm for solving decentralized bilevel optimization with
strongly convex lower level problem. Our algorithm is fully single-loop and
does not require heavy matrix-vector multiplications when approximating the
hypergradient. Moreover, unlike existing methods for decentralized bilevel
optimization and federated bilevel optimization, our algorithm does not require
any gradient heterogeneity assumption. Our analysis shows that the proposed
algorithm achieves a sublinear convergence rate. Experimental results on
hyperparameter optimization problem with both synthetic and MNIST data sets
demonstrate the efficiency of the proposed algorithm.
( 2
min )
In part 1 of the series “A Different AI Scenario: AI and Justice in a Brave New World,” I outlined some requirements for the role that AI would play in enforcing our laws and regulations in a more just and fair manner and what our human legislators must do to ensure that outcome. In part… Read More »AI and Justice in a Brave New World: Part 3 – AI Governance
The post AI and Justice in a Brave New World: Part 3 – AI Governance appeared first on Data Science Central.
( 23
min )
In recent years, Transformer-based auto-attention mechanisms have been
successfully applied to the analysis of a variety of context-reliant data
types, from texts to images and beyond, including data from non-Euclidean
geometries. In this paper, we present such a mechanism, designed to classify
sequences of Symmetric Positive Definite matrices while preserving their
Riemannian geometry throughout the analysis. We apply our method to automatic
sleep staging on timeseries of EEG-derived covariance matrices from a standard
dataset, obtaining high levels of stage-wise performance.
( 2
min )
This paper introduces a physics-informed machine learning approach for
pathloss prediction. This is achieved by including in the training phase
simultaneously (i) physical dependencies between spatial loss field and (ii)
measured pathloss values in the field. It is shown that the solution to a
proposed learning problem improves generalization and prediction quality with a
small number of neural network layers and parameters. The latter leads to fast
inference times which are favorable for downstream tasks such as localization.
Moreover, the physics-informed formulation allows training and prediction with
a small amount of training data which makes it appealing for a wide range of
practical pathloss prediction scenarios.
( 2
min )
Real-time monitoring of human behaviours, especially in e-Health
applications, has been an active area of research in the past decades. On top
of IoT-based sensing environments, anomaly detection algorithms have been
proposed for the early detection of abnormalities. Gradual change procedures,
commonly referred to as drift anomalies, have received much less attention in
the literature because they represent a much more challenging scenario than
sudden temporary changes (point anomalies). In this paper, we propose, for the
first time, a fully unsupervised real-time drift detection algorithm named
DynAmo, which can identify drift periods as they are happening. DynAmo
comprises a dynamic clustering component to capture the overall trends of
monitored behaviours and a trajectory generation component, which extracts
features from the densest cluster centroids. Finally, we apply an ensemble of
divergence tests on sliding reference and detection windows to detect drift
periods in the behavioural sequence.
( 2
min )
We propose a new method called the Metropolis-adjusted Mirror Langevin
algorithm for approximate sampling from distributions whose support is a
compact and convex set. This algorithm adds an accept-reject filter to the
Markov chain induced by a single step of the mirror Langevin algorithm (Zhang
et al., 2020), which is a basic discretisation of the mirror Langevin dynamics.
Due to the inclusion of this filter, our method is unbiased relative to the
target, while known discretisations of the mirror Langevin dynamics including
the mirror Langevin algorithm have an asymptotic bias. We give upper bounds for
the mixing time of the proposed algorithm when the potential is relatively
smooth, convex, and Lipschitz with respect to a self-concordant mirror
function. As a consequence of the reversibility of the Markov chain induced by
the algorithm, we obtain an exponentially better dependence on the error
tolerance for approximate sampling. We also present numerical experiments that
corroborate our theoretical findings.
( 2
min )
(1) The enhanced capability of Graph Neural Networks (GNNs) in unsupervised
community detection of clustered nodes is attributed to their capacity to
encode both the connectivity and feature information spaces of graphs. The
identification of latent communities holds practical significance in various
domains, from social networks to genomics. Current real-world performance
benchmarks are perplexing due to the multitude of decisions influencing GNN
evaluations for this task. (2) Three metrics are compared to assess the
consistency of algorithm rankings in the presence of randomness. The
consistency and quality of performance between the results under a
hyperparameter optimisation with the default hyperparameters is evaluated. (3)
The results compare hyperparameter optimisation with default hyperparameters,
revealing a significant performance loss when neglecting hyperparameter
investigation. A comparison of metrics indicates that ties in ranks can
substantially alter the quantification of randomness. (4) Ensuring adherence to
the same evaluation criteria may result in notable differences in the reported
performance of methods for this task. The $W$ Randomness coefficient, based on
the Wasserstein distance, is identified as providing the most robust assessment
of randomness.
( 3
min )
We study vehicle dispatching in autonomous mobility on demand (AMoD) systems,
where a central operator assigns vehicles to customer requests or rejects these
with the aim of maximizing its total profit. Recent approaches use multi-agent
deep reinforcement learning (MADRL) to realize scalable yet performant
algorithms, but train agents based on local rewards, which distorts the reward
signal with respect to the system-wide profit, leading to lower performance. We
therefore propose a novel global-rewards-based MADRL algorithm for vehicle
dispatching in AMoD systems, which resolves so far existing goal conflicts
between the trained agents and the operator by assigning rewards to agents
leveraging a counterfactual baseline. Our algorithm shows statistically
significant improvements across various settings on real-world data compared to
state-of-the-art MADRL algorithms with local rewards. We further provide a
structural analysis which shows that the utilization of global rewards can
improve implicit vehicle balancing and demand forecasting abilities. Our code
is available at https://github.com/tumBAIS/GR-MADRL-AMoD.
( 2
min )
We propose a framework that leverages foundation models as teachers, guiding
a reinforcement learning agent to acquire semantically meaningful behavior
without human feedback. In our framework, the agent receives task instructions
grounded in a training environment from large language models. Then, a
vision-language model guides the agent in learning the multi-task
language-conditioned policy by providing reward feedback. We demonstrate that
our method can learn semantically meaningful skills in a challenging open-ended
MineDojo environment while prior unsupervised skill discovery methods struggle.
Additionally, we discuss observed challenges of using off-the-shelf foundation
models as teachers and our efforts to address them.
( 2
min )
We present several methods for predicting the dynamics of Hamiltonian systems
from discrete observations of their vector field. Each method is either
informed or uninformed of the Hamiltonian property. We empirically and
comparatively evaluate the methods and observe that information that the system
is Hamiltonian can be effectively informed, and that different methods strike
different trade-offs between efficiency and effectiveness for different
dynamical systems.
( 2
min )
In real-world scenarios classification models are often required to perform
robustly when predicting samples belonging to classes that have not appeared
during its training stage. Open Set Recognition addresses this issue by
devising models capable of detecting unknown classes from samples arriving
during the testing phase, while maintaining a good level of performance in the
classification of samples belonging to known classes. This review
comprehensively overviews the recent literature related to Open Set
Recognition, identifying common practices, limitations, and connections of this
field with other machine learning research areas, such as continual learning,
out-of-distribution detection, novelty detection, and uncertainty estimation.
Our work also uncovers open problems and suggests several research directions
that may motivate and articulate future efforts towards more safe Artificial
Intelligence methods.
( 2
min )
Humanoid robots will be able to assist humans in their daily life, in
particular due to their versatile action capabilities. However, while these
robots need a certain degree of autonomy to learn and explore, they also should
respect various constraints, for access control and beyond. We explore the
novel field of incorporating privacy, security, and access control constraints
with robot task planning approaches. We report preliminary results on the
classical symbolic approach, deep-learned neural networks, and modern ideas
using large language models as knowledge base. From analyzing their trade-offs,
we conclude that a hybrid approach is necessary, and thereby present a new use
case for the emerging field of neuro-symbolic artificial intelligence.
( 2
min )
In continual learning, networks confront a trade-off between stability and
plasticity when trained on a sequence of tasks. To bolster plasticity without
sacrificing stability, we propose a novel training algorithm called LRFR. This
approach optimizes network parameters in the null space of the past tasks'
feature representation matrix to guarantee the stability. Concurrently, we
judiciously select only a subset of neurons in each layer of the network while
training individual tasks to learn the past tasks' feature representation
matrix in low-rank. This increases the null space dimension when designing
network parameters for subsequent tasks, thereby enhancing the plasticity.
Using CIFAR-100 and TinyImageNet as benchmark datasets for continual learning,
the proposed approach consistently outperforms state-of-the-art methods.
( 2
min )
We propose HAROOD as a short-range FMCW radar-based human activity classifier
and out-of-distribution (OOD) detector. It aims to classify human sitting,
standing, and walking activities and to detect any other moving or stationary
object as OOD. We introduce a two-stage network. The first stage is trained
with a novel loss function that includes intermediate reconstruction loss,
intermediate contrastive loss, and triplet loss. The second stage uses the
first stage's output as its input and is trained with cross-entropy loss. It
creates a simple classifier that performs the activity classification. On our
dataset collected by 60 GHz short-range FMCW radar, we achieve an average
classification accuracy of 96.51%. Also, we achieve an average AUROC of 95.04%
as an OOD detector. Additionally, our extensive evaluations demonstrate the
superiority of HAROOD over the state-of-the-art OOD detection methods in terms
of standard OOD detection metrics.
( 2
min )
We address the Continual Learning (CL) problem, where a model has to learn a
sequence of tasks from non-stationary distributions while preserving prior
knowledge as it encounters new experiences. With the advancement of foundation
models, CL research has shifted focus from the initial learning-from-scratch
paradigm to the use of generic features from large-scale pre-training. However,
existing approaches to CL with pre-trained models only focus on separating the
class-specific features from the final representation layer and neglect the
power of intermediate representations that capture low- and mid-level features
naturally more invariant to domain shifts. In this work, we propose LayUP, a
new class-prototype-based approach to continual learning that leverages
second-order feature statistics from multiple intermediate layers of a
pre-trained network. Our method is conceptually simple, does not require any
replay buffer, and works out of the box with any foundation model. LayUP
improves over the state-of-the-art on four of the seven class-incremental
learning settings at a considerably reduced memory and computational footprint
compared with the next best baseline. Our results demonstrate that fully
exhausting the representational capacities of pre-trained models in CL goes far
beyond their final embeddings.
( 2
min )
Deep Reinforcement Learning (DRL) has achieved remarkable advances in
sequential decision tasks. However, recent works have revealed that DRL agents
are susceptible to slight perturbations in observations. This vulnerability
raises concerns regarding the effectiveness and robustness of deploying such
agents in real-world applications. In this work, we propose a novel robust
reinforcement learning method called SortRL, which improves the robustness of
DRL policies against observation perturbations from the perspective of the
network architecture. We employ a novel architecture for the policy network
that incorporates global $l_\infty$ Lipschitz continuity and provide a
convenient method to enhance policy robustness based on the output margin.
Besides, a training framework is designed for SortRL, which solves given tasks
while maintaining robustness against $l_\infty$ bounded perturbations on the
observations. Several experiments are conducted to evaluate the effectiveness
of our method, including classic control tasks and video games. The results
demonstrate that SortRL achieves state-of-the-art robustness performance
against different perturbation strength.
( 2
min )
Many neural network architectures have been shown to be Turing Complete, and
can thus implement arbitrary algorithms. However, Transformers are unique in
that they can implement gradient-based learning algorithms \emph{under simple
parameter configurations}. A line of recent work shows that linear Transformers
naturally learn to implement gradient descent (GD) when trained on a linear
regression in-context learning task. But the linearity assumption (either in
the Transformer architecture or in the learning task) is far from realistic
settings where non-linear activations crucially enable Transformers to learn
complicated non-linear functions. In this paper, we provide theoretical and
empirical evidence that non-linear Transformers can, and \emph{in fact do},
learn to implement learning algorithms to learn non-linear functions in
context. Our results apply to a broad class of combinations of non-linear
architectures, and non-linear in-context learning tasks. Interestingly, we show
that the optimal choice of non-linear activation depends in a natural way on
the non-linearity of the learning task.
( 2
min )
Melanoma is a type of cancer that begins in the cells controlling the pigment
of the skin, and it is often referred to as the most dangerous skin cancer.
Diagnosing melanoma can be time-consuming, and a recent increase in melanoma
incidents indicates a growing demand for a more efficient diagnostic process.
This paper presents a pipeline for melanoma diagnostics, leveraging two
convolutional neural networks, a diagnosis, and a prognosis model. The
diagnostic model is responsible for localizing malignant patches across whole
slide images and delivering a patient-level diagnosis as malignant or benign.
Further, the prognosis model utilizes the diagnostic model's output to provide
a patient-level prognosis as good or bad. The full pipeline has an F1 score of
0.79 when tested on data from the same distribution as it was trained on.
( 2
min )
Polyp segmentation, a contentious issue in medical imaging, has seen numerous
proposed methods aimed at improving the quality of segmented masks. Currently,
state-of-the-art techniques yield impressive results. However, the sheer size
of these models poses challenges for practical industry applications. To
address this, we present a Knowledge Distillation framework, incorporating
attention supervision and the symmetrical guiding method. This framework is
designed to facilitate knowledge transfer from a teacher model to a more
compact student model with fewer parameters. Our experimental evaluation of the
framework assesses its effectiveness in enabling the student model to acquire
knowledge from the teacher efficiently. Additionally, our method serves to
prevent the student model from incorporating redundant features that could lead
to inaccurate predictions. Consequently, our method, boasting approximately 5
million parameters, achieves competitive results comparable to the
state-of-the-art approaches. The implementation can be found at:
https://github.com/huyquoctrinh/KDAS3
( 2
min )
In this work, we formally prove that, under certain conditions, if a neural
network is invariant to a finite group then its weights recover the Fourier
transform on that group. This provides a mathematical explanation for the
emergence of Fourier features -- a ubiquitous phenomenon in both biological and
artificial learning systems. The results hold even for non-commutative groups,
in which case the Fourier transform encodes all the irreducible unitary group
representations. Our findings have consequences for the problem of symmetry
discovery. Specifically, we demonstrate that the algebraic structure of an
unknown group can be recovered from the weights of a network that is at least
approximately invariant within certain bounds. Overall, this work contributes
to a foundation for an algebraic learning theory of invariant neural network
representations.
( 2
min )
This article presents a new methodology for extracting intervals when a home
is vacant from low-frequency electricity consumption data. The approach
combines multiple algorithms, including change point detection, classification,
period detection, and periodic spikes retrieval. It shows encouraging results
on both simulated and real consumption curves. This approach offers practical
insights for optimizing energy use and holds potential benefits for residential
consumers and utility companies in terms of energy cost reduction and
sustainability. Further research is needed to enhance its applicability in
diverse settings and with larger datasets.
( 2
min )
In various scientific and engineering applications, there is typically an
approximate model of the underlying complex system, even though it contains
both aleatoric and epistemic uncertainties. In this paper, we present a
principled method to incorporate these approximate models as physics priors in
modeling, to prevent overfitting and enhancing the generalization capabilities
of the trained models. Utilizing the structural risk minimization (SRM)
inductive principle pioneered by Vapnik, this approach structures the physics
priors into generalized regularizers. The experimental results demonstrate that
our method achieves up to two orders of magnitude of improvement in testing
accuracy.
( 2
min )
We present a novel deep learning method for estimating time-dependent
parameters in Markov processes through discrete sampling. Departing from
conventional machine learning, our approach reframes parameter approximation as
an optimization problem using the maximum likelihood approach. Experimental
validation focuses on parameter estimation in multivariate regression and
stochastic differential equations (SDEs). Theoretical results show that the
real solution is close to SDE with parameters approximated using our neural
network-derived under specific conditions. Our work contributes to SDE-based
model parameter estimation, offering a versatile tool for diverse fields.
( 2
min )
We introduced a new framework to detect perceptual bugs using a Long
Short-Term Memory (LSTM) network, which detects bugs in video games as
anomalies. The detected buggy frames are then clustered to determine the
category of the occurred bug. The framework was evaluated on two First Person
Shooter (FPS) games. Results show the effectiveness of the framework.
( 2
min )
Cardiovascular diseases, particularly heart failure, are a leading cause of
death globally. The early detection of heart failure through routine
echocardiogram screenings is often impeded by the high cost and labor-intensive
nature of these procedures, a barrier that can mean the difference between life
and death. This paper presents ConFormer, a novel deep learning model designed
to automate the estimation of Ejection Fraction (EF) and Left Ventricular Wall
Thickness from echocardiograms. The implementation of ConFormer has the
potential to enhance preventative cardiology by enabling cost-effective,
accessible, and comprehensive heart health monitoring, thereby saving countless
lives. The source code is available at https://github.com/Aether111/ConFormer.
( 2
min )
Hypernetworks are meta neural networks that generate weights for a main
neural network in an end-to-end differentiable manner. Despite extensive
applications ranging from multi-task learning to Bayesian deep learning, the
problem of optimizing hypernetworks has not been studied to date. We observe
that classical weight initialization methods like Glorot & Bengio (2010) and He
et al. (2015), when applied directly on a hypernet, fail to produce weights for
the mainnet in the correct scale. We develop principled techniques for weight
initialization in hypernets, and show that they lead to more stable mainnet
weights, lower training loss, and faster convergence.
( 2
min )
In this paper, we propose a novel personalized decision support system that
combines Theory of Mind (ToM) modeling and explainable Reinforcement Learning
(XRL) to provide effective and interpretable interventions. Our method
leverages DRL to provide expert action recommendations while incorporating ToM
modeling to understand users' mental states and predict their future actions,
enabling appropriate timing for intervention. To explain interventions, we use
counterfactual explanations based on RL's feature importance and users' ToM
model structure. Our proposed system generates accurate and personalized
interventions that are easily interpretable by end-users. We demonstrate the
effectiveness of our approach through a series of crowd-sourcing experiments in
a simulated team decision-making task, where our system outperforms control
baselines in terms of task performance. Our proposed approach is agnostic to
task environment and RL model structure, therefore has the potential to be
generalized to a wide range of applications.
( 2
min )
In many applications, such as scientific literature management, researcher
search, social network analysis and etc, Name Disambiguation (aiming at
disambiguating WhoIsWho) has been a challenging problem. In addition, the
growth of scientific literature makes the problem more difficult and urgent.
Although name disambiguation has been extensively studied in academia and
industry, the problem has not been solved well due to the clutter of data and
the complexity of the same name scenario. In this work, we aim to explore
models that can perform the task of name disambiguation using the network
structure that is intrinsic to the problem and present an analysis of the
models.
( 2
min )
The high dimensionality and complexity of neuroimaging data necessitate large
datasets to develop robust and high-performing deep learning models. However,
the neuroimaging field is notably hampered by the scarcity of such datasets. In
this work, we proposed a data augmentation and validation framework that
utilizes dynamic forecasting with Long Short-Term Memory (LSTM) networks to
enrich datasets. We extended multivariate time series data by predicting the
time courses of independent component networks (ICNs) in both one-step and
recursive configurations. The effectiveness of these augmented datasets was
then compared with the original data using various deep learning models
designed for chronological age prediction tasks. The results suggest that our
approach improves model performance, providing a robust solution to overcome
the challenges presented by the limited size of neuroimaging datasets.
( 2
min )
Motivated by policy gradient methods in the context of reinforcement
learning, we derive the first large deviation rate function for the iterates
generated by stochastic gradient descent for possibly non-convex objectives
satisfying a Polyak-Lojasiewicz condition. Leveraging the contraction principle
from large deviations theory, we illustrate the potential of this result by
showing how convergence properties of policy gradient with a softmax
parametrization and an entropy regularized objective can be naturally extended
to a wide spectrum of other policy parametrizations.
( 2
min )
We study Off-Policy Evaluation (OPE) in contextual bandit settings with large
action spaces. The benchmark estimators suffer from severe bias and variance
tradeoffs. Parametric approaches suffer from bias due to difficulty specifying
the correct model, whereas ones with importance weight suffer from variance. To
overcome these limitations, Marginalized Inverse Propensity Scoring (MIPS) was
proposed to mitigate the estimator's variance via embeddings of an action.
Nevertheless, MIPS is unbiased under the no direct effect, which assumes that
the action embedding completely mediates the effect of an action on a reward.
To overcome the dependency on these unrealistic assumptions, we propose a
Marginalized Doubly Robust (MDR) estimator. Theoretical analysis shows that the
proposed estimator is unbiased under weaker assumptions than MIPS while
reducing the variance against MIPS. The empirical experiment verifies the
supremacy of MDR against existing estimators with large action spaces.
( 2
min )
This paper introduces a physics-informed machine learning approach for
pathloss prediction. This is achieved by including in the training phase
simultaneously (i) physical dependencies between spatial loss field and (ii)
measured pathloss values in the field. It is shown that the solution to a
proposed learning problem improves generalization and prediction quality with a
small number of neural network layers and parameters. The latter leads to fast
inference times which are favorable for downstream tasks such as localization.
Moreover, the physics-informed formulation allows training and prediction with
a small amount of training data which makes it appealing for a wide range of
practical pathloss prediction scenarios.
( 2
min )
A default assumption in reinforcement learning (RL) and optimal control is
that observations arrive at discrete time points on a fixed clock cycle. Yet,
many applications involve continuous-time systems where the time
discretization, in principle, can be managed. The impact of time discretization
on RL methods has not been fully characterized in existing theory, but a more
detailed analysis of its effect could reveal opportunities for improving
data-efficiency. We address this gap by analyzing Monte-Carlo policy evaluation
for LQR systems and uncover a fundamental trade-off between approximation and
statistical error in value estimation. Importantly, these two errors behave
differently to time discretization, leading to an optimal choice of temporal
resolution for a given data budget. These findings show that managing the
temporal resolution can provably improve policy evaluation efficiency in LQR
systems with finite data. Empirically, we demonstrate the trade-off in
numerical simulations of LQR instances and standard RL benchmarks for
non-linear continuous control.
( 2
min )
Gaussian process regression is a classical kernel method for function
estimation and data interpolation. In large data applications, computational
costs can be reduced using low-rank or sparse approximations of the kernel.
This paper investigates the effect of such kernel approximations on the
interpolation error. We introduce a unified framework to analyze Gaussian
process regression under important classes of computational misspecification:
Karhunen-Lo\`eve expansions that result in low-rank kernel approximations,
multiscale wavelet expansions that induce sparsity in the covariance matrix,
and finite element representations that induce sparsity in the precision
matrix. Our theory also accounts for epistemic misspecification in the choice
of kernel parameters.
( 2
min )
This paper considers the problem of evaluating an autonomous system's
competency in performing a task, particularly when working in dynamic and
uncertain environments. The inherent opacity of machine learning models, from
the perspective of the user, often described as a `black box', poses a
challenge. To overcome this, we propose using a measure called the Surprise
index, which leverages available measurement data to quantify whether the
dynamic system performs as expected. We show that the surprise index can be
computed in closed form for dynamic systems when observed evidence in a
probabilistic model if the joint distribution for that evidence follows a
multivariate Gaussian marginal distribution. We then apply it to a nonlinear
spacecraft maneuver problem, where actions are chosen by a reinforcement
learning agent and show it can indicate how well the trajectory follows the
required orbit.
( 2
min )
Predictive Process Monitoring (PPM) aims at leveraging historic process
execution data to predict how ongoing executions will continue up to their
completion. In recent years, PPM techniques for the prediction of the next
activities have matured significantly, mainly thanks to the use of Neural
Networks (NNs) as a predictor. While their performance is difficult to beat in
the general case, there are specific situations where background process
knowledge can be helpful. Such knowledge can be leveraged for improving the
quality of predictions for exceptional process executions or when the process
changes due to a concept drift. In this paper, we present a Symbolic[Neuro]
system that leverages background knowledge expressed in terms of a procedural
process model to offset the under-sampling in the training data. More
specifically, we make predictions using NNs with attention mechanism, an
emerging technology in the NN field. The system has been tested on several
real-life logs showing an improvement in the performance of the prediction
task.
( 2
min )
A large amount of effort has recently been put into understanding the barren
plateau phenomenon. In this perspective article, we face the increasingly loud
elephant in the room and ask a question that has been hinted at by many but not
explicitly addressed: Can the structure that allows one to avoid barren
plateaus also be leveraged to efficiently simulate the loss classically? We
present strong evidence that commonly used models with provable absence of
barren plateaus are also classically simulable, provided that one can collect
some classical data from quantum devices during an initial data acquisition
phase. This follows from the observation that barren plateaus result from a
curse of dimensionality, and that current approaches for solving them end up
encoding the problem into some small, classically simulable, subspaces. This
sheds serious doubt on the non-classicality of the information processing
capabilities of parametrized quantum circuits for barren plateau-free
landscapes and on the possibility of superpolynomial advantages from running
them on quantum hardware. We end by discussing caveats in our arguments, the
role of smart initializations, and by highlighting new opportunities that our
perspective raises.
( 3
min )
We propose a new method called the Metropolis-adjusted Mirror Langevin
algorithm for approximate sampling from distributions whose support is a
compact and convex set. This algorithm adds an accept-reject filter to the
Markov chain induced by a single step of the mirror Langevin algorithm (Zhang
et al., 2020), which is a basic discretisation of the mirror Langevin dynamics.
Due to the inclusion of this filter, our method is unbiased relative to the
target, while known discretisations of the mirror Langevin dynamics including
the mirror Langevin algorithm have an asymptotic bias. We give upper bounds for
the mixing time of the proposed algorithm when the potential is relatively
smooth, convex, and Lipschitz with respect to a self-concordant mirror
function. As a consequence of the reversibility of the Markov chain induced by
the algorithm, we obtain an exponentially better dependence on the error
tolerance for approximate sampling. We also present numerical experiments that
corroborate our theoretical findings.
( 2
min )
We present a novel deep learning method for estimating time-dependent
parameters in Markov processes through discrete sampling. Departing from
conventional machine learning, our approach reframes parameter approximation as
an optimization problem using the maximum likelihood approach. Experimental
validation focuses on parameter estimation in multivariate regression and
stochastic differential equations (SDEs). Theoretical results show that the
real solution is close to SDE with parameters approximated using our neural
network-derived under specific conditions. Our work contributes to SDE-based
model parameter estimation, offering a versatile tool for diverse fields.
( 2
min )
We are excited to announce the launch of Amazon DocumentDB (with MongoDB compatibility) integration with Amazon SageMaker Canvas, allowing Amazon DocumentDB customers to build and use generative AI and machine learning (ML) solutions without writing code. Amazon DocumentDB is a fully managed native JSON document database that makes it straightforward and cost-effective to operate critical […]
( 9
min )
“Minimum viewing time” benchmark gauges image recognition complexity for AI systems by measuring the time needed for accurate human identification.
( 11
min )
Using generative AI, MIT chemists created a model that can predict the structures formed when a chemical reaction reaches its point of no return.
( 9
min )
Generative Artificial Intelligence (AI) is one of the most exciting
developments in Computer Science of the last decade. At the same time,
Reinforcement Learning (RL) has emerged as a very successful paradigm for a
variety of machine learning tasks. In this survey, we discuss the state of the
art, opportunities and open research questions in applying RL to generative AI.
In particular, we will discuss three types of applications, namely, RL as an
alternative way for generation without specified objectives; as a way for
generating outputs while concurrently maximizing an objective function; and,
finally, as a way of embedding desired characteristics, which cannot be easily
captured by means of an objective function, into the generative process. We
conclude the survey with an in-depth discussion of the opportunities and
challenges in this fascinating emerging area.
( 2
min )
In distributed training, communication often emerges as a bottleneck. In
response, we introduce Kimad, a solution that offers adaptive gradient
compression. By consistently monitoring bandwidth, Kimad refines compression
ratios to match specific neural network layer requirements. Our exhaustive
tests and proofs confirm Kimad's outstanding performance, establishing it as a
benchmark in adaptive compression for distributed deep learning.
( 2
min )
Quantum neural networks (QNNs) and quantum kernels stand as prominent figures
in the realm of quantum machine learning, poised to leverage the nascent
capabilities of near-term quantum computers to surmount classical machine
learning challenges. Nonetheless, the training efficiency challenge poses a
limitation on both QNNs and quantum kernels, curbing their efficacy when
applied to extensive datasets. To confront this concern, we present a unified
approach: coreset selection, aimed at expediting the training of QNNs and
quantum kernels by distilling a judicious subset from the original training
dataset. Furthermore, we analyze the generalization error bounds of QNNs and
quantum kernels when trained on such coresets, unveiling the comparable
performance with those training on the complete original dataset. Through
systematic numerical simulations, we illuminate the potential of coreset
selection in expediting tasks encompassing synthetic data classification,
identification of quantum correlations, and quantum compiling. Our work offers
a useful way to improve diverse quantum machine learning models with a
theoretical guarantee while reducing the training cost.
( 2
min )
We present a new method for functional tissue unit segmentation at the
cellular level, which utilizes the latest deep learning semantic segmentation
approaches together with domain adaptation and semi-supervised learning
techniques. This approach allows for minimizing the domain gap, class
imbalance, and captures settings influence between HPA and HubMAP datasets. The
presented approach achieves comparable with state-of-the-art-result in
functional tissue unit segmentation at the cellular level. The source code is
available at https://github.com/VSydorskyy/hubmap_2022_htt_solution
( 2
min )
We consider decentralized learning for zero-sum games, where players only see
their payoff information and are agnostic to actions and payoffs of the
opponent. Previous works demonstrated convergence to a Nash equilibrium in this
setting using double time-scale algorithms under strong reachability
assumptions. We address the open problem of achieving an approximate Nash
equilibrium efficiently with an uncoupled and single time-scale algorithm under
weaker conditions. Our contribution is a rational and convergent algorithm,
utilizing Tsallis-entropy regularization in a value-iteration-based approach.
The algorithm learns an approximate Nash equilibrium in polynomial time,
requiring only the existence of a policy pair that induces an irreducible and
aperiodic Markov chain, thus considerably weakening past assumptions. Our
analysis leverages negative drift inequalities and introduces novel properties
of Tsallis entropy that are of independent interest.
( 2
min )
This paper extends our previous method for COVID-19 diagnosis, proposing an
enhanced solution for detecting COVID-19 from computed tomography (CT) images.
To decrease model misclassifications, two key steps of image processing were
employed. Firstly, the uppermost and lowermost slices were removed, preserving
sixty percent of each patient's slices. Secondly, all slices underwent manual
cropping to emphasize the lung areas. Subsequently, resized CT scans (224 by
224) were input into an Xception transfer learning model. Leveraging Xception's
architecture and pre-trained weights, the modified model achieved binary
classification. Promising results on the COV19-CT database showcased higher
validation accuracy and macro F1 score at both the slice and patient levels
compared to our previous solution and alternatives on the same dataset.
( 2
min )
Cadastres from the 19th century are a complex as well as rich source for
historians and archaeologists, whose use presents them with great challenges.
For archaeological and historical remote sensing, we have trained several Deep
Learning models, CNNs as well as Vision Transformers, to extract large-scale
data from this knowledge representation. We present the principle results of
our work here and we present a the demonstrator of our browser-based tool that
allows researchers and public stakeholders to quickly identify spots that
featured buildings in the 19th century Franciscean Cadastre. The tool not only
supports scholars and fellow researchers in building a better understanding of
the settlement history of the region of Styria, it also helps public
administration and fellow citizens to swiftly identify areas of heightened
sensibility with regard to the cultural heritage of the region.
( 2
min )
Popular guidance for denoising diffusion probabilistic model (DDPM) linearly
combines distinct conditional models together to provide enhanced control over
samples. However, this approach overlooks nonlinear effects that become
significant when guidance scale is large. To address this issue, we propose
characteristic guidance, a novel method that provides non-linear correction for
classifier-free guided DDPMs. Such correction forces the guided DDPMs to
respect the Fokker-Planck equation of their underlying diffusion process, in a
way that is first-principle, training-free, derivative-free, and compatible
with existing sampling methods. Experiments show that characteristic guidance
is robust to various applications, offers enhanced control over sample
generation, suppresses color and exposure issues even for latent space
sampling, and can handle physics problems such as the phase transitions.
( 2
min )
Likelihood-free inference is quickly emerging as a powerful tool to perform
fast/effective parameter estimation. We demonstrate a technique of optimizing
likelihood-free inference to make it even faster by marginalizing symmetries in
a physical problem. In this approach, physical symmetries, for example,
time-translation are learned using joint-embedding via self-supervised learning
with symmetry data augmentations. Subsequently, parameter inference is
performed using a normalizing flow where the embedding network is used to
summarize the data before conditioning the parameters. We present this approach
on two simple physical problems and we show faster convergence in a smaller
number of parameters compared to a normalizing flow that does not use a
pre-trained symmetry-informed representation.
( 2
min )
The utilization of deep learning-based object detection is an effective
approach to assist visually impaired individuals in avoiding obstacles. In this
paper, we implemented seven different YOLO object detection models
\textit{viz}., YOLO-NAS (small, medium, large), YOLOv8, YOLOv7, YOLOv6, and
YOLOv5 and performed comprehensive evaluation with carefully tuned
hyperparameters, to analyze how these models performed on images containing
common daily-life objects presented on roads and sidewalks. After a systematic
investigation, YOLOv8 was found to be the best model, which reached a precision
of $80\%$ and a recall of $68.2\%$ on a well-known Obstacle Dataset which
includes images from VOC dataset, COCO dataset, and TT100K dataset along with
images collected by the researchers in the field. Despite being the latest
model and demonstrating better performance in many other applications, YOLO-NAS
was found to be suboptimal for the obstacle detection task.
( 2
min )
Sleep detection and annotation are crucial for researchers to understand
sleep patterns, especially in children. With modern wrist-worn watches
comprising built-in accelerometers, sleep logs can be collected. However, the
annotation of these logs into distinct sleep events: onset and wakeup, proves
to be challenging. These annotations must be automated, precise, and scalable.
We propose to model the accelerometer data using different machine learning
(ML) techniques such as support vectors, boosting, ensemble methods, and more
complex approaches involving LSTMs and Region-based CNNs. Later, we aim to
evaluate these approaches using the Event Detection Average Precision (EDAP)
score (similar to the IOU metric) to eventually compare the predictive power
and model performance.
( 2
min )
Safeguarding privacy in sensitive training data is paramount, particularly in
the context of generative modeling. This is done through either differentially
private stochastic gradient descent, or with a differentially private metric
for training models or generators. In this paper, we introduce a novel
differentially private generative modeling approach based on parameter-free
gradient flows in the space of probability measures. The proposed algorithm is
a new discretized flow which operates through a particle scheme, utilizing
drift derived from the sliced Wasserstein distance and computed in a private
manner. Our experiments show that compared to a generator-based model, our
proposed model can generate higher-fidelity data at a low privacy budget,
offering a viable alternative to generator-based approaches.
( 2
min )
Influenced mixed moving average fields are a versatile modeling class for
spatio-temporal data. However, their predictive distribution is not generally
known. Under this modeling assumption, we define a novel spatio-temporal
embedding and a theory-guided machine learning approach that employs a
generalized Bayesian algorithm to make ensemble forecasts. We employ Lipschitz
predictors and determine fixed-time and any-time PAC Bayesian bounds in the
batch learning setting. Performing causal forecast is a highlight of our
methodology as its potential application to data with spatial and temporal
short and long-range dependence. We then test the performance of our learning
methodology by using linear predictors and data sets simulated from a
spatio-temporal Ornstein-Uhlenbeck process.
( 2
min )
The randomly pivoted partial Cholesky algorithm (RPCholesky) computes a
factorized rank-k approximation of an N x N positive-semidefinite (psd) matrix.
RPCholesky requires only (k + 1) N entry evaluations and O(k^2 N) additional
arithmetic operations, and it can be implemented with just a few lines of code.
The method is particularly useful for approximating a kernel matrix.
This paper offers a thorough new investigation of the empirical and
theoretical behavior of this fundamental algorithm. For matrix approximation
problems that arise in scientific machine learning, experiments show that
RPCholesky matches or beats the performance of alternative algorithms.
Moreover, RPCholesky provably returns low-rank approximations that are nearly
optimal. The simplicity, effectiveness, and robustness of RPCholesky strongly
support its use in scientific computing and machine learning applications.
( 2
min )
The Energy and Climate Hack presented opportunities for students and companies to collaborate and develop innovative solutions.
( 8
min )
Amazon SageMaker Studio offers a broad set of fully managed integrated development environments (IDEs) for machine learning (ML) development, including JupyterLab, Code Editor based on Code-OSS (Visual Studio Code Open Source), and RStudio. It provides access to the most comprehensive set of tools for each step of ML development, from preparing data to building, training, […]
( 16
min )
This is a customer post jointly authored by ICL and AWS employees. ICL is a multi-national manufacturing and mining corporation based in Israel that manufactures products based on unique minerals and fulfills humanity’s essential needs, primarily in three markets: agriculture, food, and engineered materials. Their mining sites use industrial equipment that has to be monitored […]
( 8
min )
Amazon Comprehend is a natural-language processing (NLP) service that provides pre-trained and custom APIs to derive insights from textual data. Amazon Comprehend customers can train custom named entity recognition (NER) models to extract entities of interest, such as location, person name, and date, that are unique to their business. To train a custom model, you […]
( 8
min )
Text-to-image generation is a rapidly growing field of artificial intelligence with applications in a variety of areas, such as media and entertainment, gaming, ecommerce product visualization, advertising and marketing, architectural design and visualization, artistic creations, and medical imaging. Stable Diffusion is a text-to-image model that empowers you to create high-quality images within seconds. In November […]
( 9
min )
This post outlines the ETL pipeline we developed for feature processing for training and deploying a job recommender model at Talent.com. Our pipeline uses SageMaker Processing jobs for efficient data processing and feature extraction at a large scale. Feature extraction code is implemented in Python enabling the use of popular ML libraries to perform feature extraction at scale, without the need to port the code to use PySpark.
( 10
min )
This GFN Thursday is burning rubber with the latest Forza Horizon games from Microsoft Studios. Check them out on PC Game Pass. Plus, give the gift of cloud gaming with the latest membership bundle, which includes a free, three-month PC Game Pass subscription with the purchase of a six-month GeForce NOW Ultimate membership. It’s all Read article >
( 6
min )
No content preview
( 1
min )
We present Cross-Client Label Propagation(XCLP), a new method for
transductive federated learning. XCLP estimates a data graph jointly from the
data of multiple clients and computes labels for the unlabeled data by
propagating label information across the graph. To avoid clients having to
share their data with anyone, XCLP employs two cryptographically secure
protocols: secure Hamming distance computation and secure summation. We
demonstrate two distinct applications of XCLP within federated learning. In the
first, we use it in a one-shot way to predict labels for unseen test points. In
the second, we use it to repeatedly pseudo-label unlabeled training data in a
federated semi-supervised setting. Experiments on both real federated and
standard benchmark datasets show that in both applications XCLP achieves higher
classification accuracy than alternative approaches.
( 2
min )
In this paper, we study the mistake bound of online kernel learning on a
budget. We propose a new budgeted online kernel learning model, called
Ahpatron, which significantly improves the mistake bound of previous work and
resolves the open problem posed by Dekel, Shalev-Shwartz, and Singer (2005). We
first present an aggressive variant of Perceptron, named AVP, a model without
budget, which uses an active updating rule. Then we design a new budget
maintenance mechanism, which removes a half of examples,and projects the
removed examples onto a hypothesis space spanned by the remaining examples.
Ahpatron adopts the above mechanism to approximate AVP. Theoretical analyses
prove that Ahpatron has tighter mistake bounds, and experimental results show
that Ahpatron outperforms the state-of-the-art algorithms on the same or a
smaller budget.
( 2
min )
We present the first optimal rates for infinite-dimensional vector-valued
ridge regression on a continuous scale of norms that interpolate between $L_2$
and the hypothesis space, which we consider as a vector-valued reproducing
kernel Hilbert space. These rates allow to treat the misspecified case in which
the true regression function is not contained in the hypothesis space. We
combine standard assumptions on the capacity of the hypothesis space with a
novel tensor product construction of vector-valued interpolation spaces in
order to characterize the smoothness of the regression function. Our upper
bound not only attains the same rate as real-valued kernel ridge regression,
but also removes the assumption that the target regression function is bounded.
For the lower bound, we reduce the problem to the scalar setting using a
projection argument. We show that these rates are optimal in most cases and
independent of the dimension of the output space. We illustrate our results for
the special case of vector-valued Sobolev spaces.
( 2
min )
We propose a novel algorithmic framework for distributional reinforcement
learning, based on learning finite-dimensional mean embeddings of return
distributions. We derive several new algorithms for dynamic programming and
temporal-difference learning based on this framework, provide asymptotic
convergence theory, and examine the empirical performance of the algorithms on
a suite of tabular tasks. Further, we show that this approach can be
straightforwardly combined with deep reinforcement learning, and obtain a new
deep RL agent that improves over baseline distributional approaches on the
Arcade Learning Environment.
( 2
min )
We present ELSA, a practical solution for creating deep networks that can
easily be deployed at different levels of sparsity. The core idea is to embed
one or more sparse networks within a single dense network as a proper subset of
the weights. At prediction time, any sparse model can be extracted effortlessly
simply be zeroing out weights according to a predefined mask. ELSA is simple,
powerful and highly flexible. It can use essentially any existing technique for
network sparsification and network training. In particular, it does not
restrict the loss function, architecture or the optimization technique. Our
experiments show that ELSA's advantages of flexible deployment comes with no or
just a negligible reduction in prediction quality compared to the standard way
of using multiple sparse networks that are trained and stored independently.
( 2
min )
This paper presents a novel methodology for improving the performance of
machine learning based space traffic management tasks through the use of a
pre-trained orbit model. Taking inspiration from BERT-like self-supervised
language models in the field of natural language processing, we introduce
ORBERT, and demonstrate the ability of such a model to leverage large
quantities of readily available orbit data to learn meaningful representations
that can be used to aid in downstream tasks. As a proof of concept of this
approach we consider the task of all vs. all conjunction screening, phrased
here as a machine learning time series classification task. We show that
leveraging unlabelled orbit data leads to improved performance, and that the
proposed approach can be particularly beneficial for tasks where the
availability of labelled data is limited.
( 2
min )
We propose a novel algorithmic framework for distributional reinforcement
learning, based on learning finite-dimensional mean embeddings of return
distributions. We derive several new algorithms for dynamic programming and
temporal-difference learning based on this framework, provide asymptotic
convergence theory, and examine the empirical performance of the algorithms on
a suite of tabular tasks. Further, we show that this approach can be
straightforwardly combined with deep reinforcement learning, and obtain a new
deep RL agent that improves over baseline distributional approaches on the
Arcade Learning Environment.
( 2
min )
In this paper, we introduce a novel analysis of neural networks based on
geometric (Clifford) algebra and convex optimization. We show that optimal
weights of deep ReLU neural networks are given by the wedge product of training
samples when trained with standard regularized loss. Furthermore, the training
problem reduces to convex optimization over wedge product features, which
encode the geometric structure of the training dataset. This structure is given
in terms of signed volumes of triangles and parallelotopes generated by data
vectors. The convex problem finds a small subset of samples via $\ell_1$
regularization to discover only relevant wedge product features. Our analysis
provides a novel perspective on the inner workings of deep neural networks and
sheds light on the role of the hidden layers.
( 2
min )
In this paper, we study the mistake bound of online kernel learning on a
budget. We propose a new budgeted online kernel learning model, called
Ahpatron, which significantly improves the mistake bound of previous work and
resolves the open problem posed by Dekel, Shalev-Shwartz, and Singer (2005). We
first present an aggressive variant of Perceptron, named AVP, a model without
budget, which uses an active updating rule. Then we design a new budget
maintenance mechanism, which removes a half of examples,and projects the
removed examples onto a hypothesis space spanned by the remaining examples.
Ahpatron adopts the above mechanism to approximate AVP. Theoretical analyses
prove that Ahpatron has tighter mistake bounds, and experimental results show
that Ahpatron outperforms the state-of-the-art algorithms on the same or a
smaller budget.
( 2
min )
The low-level spatial detail information and high-level semantic abstract
information are both essential to the semantic segmentation task. The features
extracted by the deep network can obtain rich semantic information, while a lot
of spatial information is lost. However, how to recover spatial detail
information effectively and fuse it with high-level semantics has not been well
addressed so far. In this paper, we propose a new architecture based on
Bilateral Segmentation Network (BiseNet) called Multi-scale Covariance Feature
Fusion Network (MCFNet). Specifically, this network introduces a new feature
refinement module and a new feature fusion module. Furthermore, a gating unit
named L-Gate is proposed to filter out invalid information and fuse multi-scale
features. We evaluate our proposed model on Cityscapes, CamVid datasets and
compare it with the state-of-the-art methods. Extensive experiments show that
our method achieves competitive success. On Cityscapes, we achieve 75.5% mIOU
with a speed of 151.3 FPS.
( 2
min )
We present the first optimal rates for infinite-dimensional vector-valued
ridge regression on a continuous scale of norms that interpolate between $L_2$
and the hypothesis space, which we consider as a vector-valued reproducing
kernel Hilbert space. These rates allow to treat the misspecified case in which
the true regression function is not contained in the hypothesis space. We
combine standard assumptions on the capacity of the hypothesis space with a
novel tensor product construction of vector-valued interpolation spaces in
order to characterize the smoothness of the regression function. Our upper
bound not only attains the same rate as real-valued kernel ridge regression,
but also removes the assumption that the target regression function is bounded.
For the lower bound, we reduce the problem to the scalar setting using a
projection argument. We show that these rates are optimal in most cases and
independent of the dimension of the output space. We illustrate our results for
the special case of vector-valued Sobolev spaces.
( 2
min )
In this paper, we provide novel tail bounds on the optimization error of
Stochastic Mirror Descent for convex and Lipschitz objectives. Our analysis
extends the existing tail bounds from the classical light-tailed Sub-Gaussian
noise case to heavier-tailed noise regimes. We study the optimization error of
the last iterate as well as the average of the iterates. We instantiate our
results in two important cases: a class of noise with exponential tails and one
with polynomial tails. A remarkable feature of our results is that they do not
require an upper bound on the diameter of the domain. Finally, we support our
theory with illustrative experiments that compare the behavior of the average
of the iterates with that of the last iterate in heavy-tailed noise regimes.
( 2
min )
The graduate students will aim to commercialize innovations in AI, machine learning, and data science.
( 8
min )
Study shows computational models trained to perform auditory tasks display an internal organization similar to that of the human auditory cortex.
( 9
min )
A new method enables optical devices that more closely match their design specifications, boosting accuracy and efficiency.
( 10
min )
Zipline isn’t just some pie-in-the-sky drone startup. The San Francisco-based company has completed more than 800,000 deliveries in seven countries since its start in 2011. It recently added services for Seattle’s Pagliacci Pizza, vitamin and supplement giant GNC, and large health systems like Intermountain Health, OhioHealth and Michigan Medicine. Zipline developed its drones — which Read article >
( 6
min )
Meeting notes are a crucial part of collaboration, yet they often fall through the cracks. Between leading discussions, listening closely, and typing notes, it’s easy for key information to slip away unrecorded. Even when notes are captured, they can be disorganized or illegible, rendering them useless. In this post, we explore how to use Amazon […]
( 8
min )
In this post, we showcase fine-tuning a Llama 2 model using a Parameter-Efficient Fine-Tuning (PEFT) method and deploy the fine-tuned model on AWS Inferentia2. We use the AWS Neuron software development kit (SDK) to access the AWS Inferentia2 device and benefit from its high performance. We then use a large model inference container powered by […]
( 10
min )
Machine learning (ML) models do not operate in isolation. To deliver value, they must integrate into existing production systems and infrastructure, which necessitates considering the entire ML lifecycle during design and development. ML operations, known as MLOps, focus on streamlining, automating, and monitoring ML models throughout their lifecycle. Building a robust MLOps pipeline demands cross-functional […]
( 13
min )
Axel Springer is the first publishing house globally to partner with us on a deeper integration of journalism in AI technologies.
( 2
min )
In this work, we present Transformer-based Powered Descent Guidance (T-PDG),
a scalable algorithm for reducing the computational complexity of the direct
optimization formulation of the spacecraft powered descent guidance problem.
T-PDG uses data from prior runs of trajectory optimization algorithms to train
a transformer neural network, which accurately predicts the relationship
between problem parameters and the globally optimal solution for the powered
descent guidance problem. The solution is encoded as the set of tight
constraints corresponding to the constrained minimum-cost trajectory and the
optimal final time of landing. By leveraging the attention mechanism of
transformer neural networks, large sequences of time series data can be
accurately predicted when given only the spacecraft state and landing site
parameters. When applied to the real problem of Mars powered descent guidance,
T-PDG reduces the time for computing the 3 degree of freedom fuel-optimal
trajectory, when compared to lossless convexification, from an order of 1-8
seconds to less than 500 milliseconds. A safe and optimal solution is
guaranteed by including a feasibility check in T-PDG before returning the final
trajectory.
( 2
min )
We introduce a curriculum learning algorithm, Variational Automatic
Curriculum Learning (VACL), for solving challenging goal-conditioned
cooperative multi-agent reinforcement learning problems. We motivate our
paradigm through a variational perspective, where the learning objective can be
decomposed into two terms: task learning on the current task distribution, and
curriculum update to a new task distribution. Local optimization over the
second term suggests that the curriculum should gradually expand the training
tasks from easy to hard. Our VACL algorithm implements this variational
paradigm with two practical components, task expansion and entity progression,
which produces training curricula over both the task configurations as well as
the number of entities in the task. Experiment results show that VACL solves a
collection of sparse-reward problems with a large number of agents.
Particularly, using a single desktop machine, VACL achieves 98% coverage rate
with 100 agents in the simple-spread benchmark and reproduces the ramp-use
behavior originally shown in OpenAI's hide-and-seek project. Our project
website is at https://sites.google.com/view/vacl-neurips-2021.
( 2
min )
Multilinear Principal Component Analysis (MPCA) is a widely utilized method
for the dimension reduction of tensor data. However, the integration of MPCA
into federated learning remains unexplored in existing research. To tackle this
gap, this article proposes a Federated Multilinear Principal Component Analysis
(FMPCA) method, which enables multiple users to collaboratively reduce the
dimension of their tensor data while keeping each user's data local and
confidential. The proposed FMPCA method is guaranteed to have the same
performance as traditional MPCA. An application of the proposed FMPCA in
industrial prognostics is also demonstrated. Simulated data and a real-world
data set are used to validate the performance of the proposed method.
( 2
min )
This paper presents a novel algorithm that leverages Stochastic Gradient
Descent strategies in conjunction with Random Features to augment the
scalability of Conic Particle Gradient Descent (CPGD) specifically tailored for
solving sparse optimisation problems on measures. By formulating the CPGD steps
within a variational framework, we provide rigorous mathematical proofs
demonstrating the following key findings: (i) The total variation norms of the
solution measures along the descent trajectory remain bounded, ensuring
stability and preventing undesirable divergence; (ii) We establish a global
convergence guarantee with a convergence rate of
$\mathcal{O}(\log(K)/\sqrt{K})$ over $K$ iterations, showcasing the efficiency
and effectiveness of our algorithm; (iii) Additionally, we analyze and
establish local control over the first-order condition discrepancy,
contributing to a deeper understanding of the algorithm's behavior and
reliability in practical applications.
( 2
min )
Differentiating noisy, discrete measurements in order to fit an ordinary
differential equation can be unreasonably effective. Assuming square-integrable
noise and minimal flow regularity, we construct and analyze a finite-difference
differentiation filter and a Tikhonov-regularized least squares estimator for
the continuous-time parameter-linear system. Combining these contributions in
series, we obtain a finite-sample bound on mean absolute error of estimation.
As a by-product, we offer a novel analysis of stochastically perturbed
Moore-Penrose pseudoinverses.
( 2
min )
To address the bias of the canonical two-way fixed effects estimator for
difference-in-differences under staggered adoptions, Wooldridge (2021) proposed
the extended two-way fixed effects estimator, which adds many parameters.
However, this reduces efficiency. Restricting some of these parameters to be
equal helps, but ad hoc restrictions may reintroduce bias. We propose a machine
learning estimator with a single tuning parameter, fused extended two-way fixed
effects (FETWFE), that enables automatic data-driven selection of these
restrictions. We prove that under an appropriate sparsity assumption FETWFE
identifies the correct restrictions with probability tending to one. We also
prove the consistency, asymptotic normality, and oracle efficiency of FETWFE
for two classes of heterogeneous marginal treatment effect estimators under
either conditional or marginal parallel trends, and we prove consistency for
two classes of conditional average treatment effects under conditional parallel
trends. We demonstrate FETWFE in simulation studies and an empirical
application.
( 2
min )
Phi-2 is now accessible on the Azure model catalog. Its compact size and new innovations in model scaling and training data curation make it ideal for exploration around mechanistic interpretability, safety improvements, and fine-tuning experimentation on a variety of tasks.
The post Phi-2: The surprising power of small language models appeared first on Microsoft Research.
( 11
min )
The launch of ChatGPT and rise in popularity of generative AI have captured the imagination of customers who are curious about how they can use this technology to create new products and services on AWS, such as enterprise chatbots, which are more conversational. This post shows you how you can create a web UI, which […]
( 9
min )
Large language models (or LLMs) have become a topic of daily conversations. Their quick adoption is evident by the amount of time required to reach a 100 million users, which has gone from “4.5yrs by facebook” to an all-time low of mere “2 months by ChatGPT.” A generative pre-trained transformer (GPT) uses causal autoregressive updates […]
( 7
min )
Vodafone is transitioning from a telecommunications company (telco) to a technology company (TechCo) by 2025, with objectives of innovating faster, reducing costs, improving security, and simplifying operations. Thousands of engineers are being onboarded to contribute to this transition. By 2025, Vodafone plans to have 50% of its global workforce actively involved in software development, with […]
( 6
min )
Justin Solomon applies modern geometric techniques to solve problems in computer vision, machine learning, statistics, and beyond.
( 10
min )
The creative team at Moonshine Studio — an artist-focused visual effects (VFX) studio specializing in animation and motion design — was tasked to solve a problem.
( 7
min )
The expressivity of Graph Neural Networks (GNNs) can be entirely
characterized by appropriate fragments of the first-order logic. Namely, any
query of the two variable fragment of graded modal logic (GC2) interpreted over
labeled graphs can be expressed using a GNN whose size depends only on the
depth of the query. As pointed out by [Barcelo & Al., 2020, Grohe, 2021], this
description holds for a family of activation functions, leaving the possibility
for a hierarchy of logics expressible by GNNs depending on the chosen
activation function. In this article, we show that such hierarchy indeed exists
by proving that GC2 queries cannot be expressed by GNNs with polynomial
activation functions. This implies a separation between polynomial and popular
non-polynomial activations (such as Rectified Linear Units) and answers an open
question formulated by [Grohe, 2021].
( 2
min )
In the era of artificial intelligence, data is gold but costly to annotate.
The paper demonstrates a groundbreaking solution to this dilemma using ChatGPT
for text augmentation in sentiment analysis. We leverage ChatGPT's generative
capabilities to create synthetic training data that significantly improves the
performance of smaller models, making them competitive with, or even
outperforming, their larger counterparts. This innovation enables models to be
both efficient and effective, thereby reducing computational cost, inference
time, and memory usage without compromising on quality. Our work marks a key
advancement in the cost-effective development and deployment of robust
sentiment analysis models.
( 2
min )
The Chinese Space Station Telescope (abbreviated as CSST) is a future
advanced space telescope. Real-time identification of galaxy and nebula/star
cluster (abbreviated as NSC) images is of great value during CSST survey. While
recent research on celestial object recognition has progressed, the rapid and
efficient identification of high-resolution local celestial images remains
challenging. In this study, we conducted galaxy and NSC image classification
research using deep learning methods based on data from the Hubble Space
Telescope. We built a Local Celestial Image Dataset and designed a deep
learning model named HR-CelestialNet for classifying images of the galaxy and
NSC. HR-CelestialNet achieved an accuracy of 89.09% on the testing set,
outperforming models such as AlexNet, VGGNet and ResNet, while demonstrating
faster recognition speeds. Furthermore, we investigated the factors influencing
CSST image quality and evaluated the generalization ability of HR-CelestialNet
on the blurry image dataset, demonstrating its robustness to low image quality.
The proposed method can enable real-time identification of celestial images
during CSST survey mission.
( 2
min )
Assurance Cases (ACs) are an established approach in safety engineering to
argue quality claims in a structured way. In the context of quality assurance
for Machine Learning (ML)-based software components, ACs are also being
discussed and appear promising. Tools for operationalizing ACs do exist, yet
mainly focus on supporting safety engineers on the system level. However,
assuring the quality of an ML component within the system is commonly the
responsibility of data scientists, who are usually less familiar with these
tools. To address this gap, we propose a framework to support the
operationalization of ACs for ML components based on technologies that data
scientists use on a daily basis: Python and Jupyter Notebook. Our aim is to
make the process of creating ML-related evidence in ACs more effective. Results
from the application of the framework, documented through notebooks, can be
integrated into existing AC tools. We illustrate the application of the
framework on an example excerpt concerned with the quality of the test data.
( 3
min )
Training generative models to produce synthetic data is meant to provide a
privacy-friendly approach to data release. However, we get robust guarantees
only when models are trained to satisfy Differential Privacy (DP). Alas, this
is not the standard in industry as many companies use ad-hoc strategies to
empirically evaluate privacy based on the statistical similarity between
synthetic and real data. In this paper, we review the privacy metrics offered
by leading companies in this space and shed light on a few critical flaws in
reasoning about privacy entirely via empirical evaluations. We analyze the
undesirable properties of the most popular metrics and filters and demonstrate
their unreliability and inconsistency through counter-examples. We then present
a reconstruction attack, ReconSyn, which successfully recovers (i.e., leaks all
attributes of) at least 78% of the low-density train records (or outliers) with
only black-box access to a single fitted generative model and the privacy
metrics. Finally, we show that applying DP only to the model or using
low-utility generators does not mitigate ReconSyn as the privacy leakage
predominantly comes from the metrics. Overall, our work serves as a warning to
practitioners not to deviate from established privacy-preserving mechanisms.
( 2
min )
Communication networks able to withstand hostile environments are critically
important for disaster relief operations. In this paper, we consider a
challenging scenario where drones have been compromised in the supply chain,
during their manufacture, and harbour malicious software capable of
wide-ranging and infectious disruption. We investigate multi-agent deep
reinforcement learning as a tool for learning defensive strategies that
maximise communications bandwidth despite continual adversarial interference.
Using a public challenge for learning network resilience strategies, we propose
a state-of-the-art expert technique and study its superiority over deep
reinforcement learning agents. Correspondingly, we identify three specific
methods for improving the performance of our learning-based agents: (1)
ensuring each observation contains the necessary information, (2) using expert
agents to provide a curriculum for learning, and (3) paying close attention to
reward. We apply our methods and present a new mixed strategy enabling expert
and learning-based agents to work together and improve on all prior results.
( 2
min )
Can we learn policies in reinforcement learning without rewards? Can we learn
a policy just by trying to reach a goal state? We answer these questions
positively by proposing a multi-step procedure that first learns a world model
that goes backward in time, secondly generates goal-reaching backward
trajectories, thirdly improves those sequences using shortest path finding
algorithms, and finally trains a neural network policy by imitation learning.
We evaluate our method on a deterministic maze environment where the
observations are $64\times 64$ pixel bird's eye images and can show that it
consistently reaches several goals.
( 2
min )
SCGAN adds a similarity constraint between generated images and conditions as
a regularization term on generative adversarial networks. Similarity constraint
works as a tutor to instruct the generator network to comprehend the difference
of representations based on conditions. We understand how SCGAN works on a
deeper level. This understanding makes us realize that the similarity
constraint functions like the contrastive loss function. We believe that a
model with high understanding and intelligence measures the similarity between
images based on their structure and high level features, just like humans do.
Two major changes we applied to SCGAN in order to make a modified model are
using SSIM to measure similarity between images and applying contrastive loss
principles to the similarity constraint. The modified model performs better
using FID and FactorVAE metrics. The modified model also has better
generalisability compared to other models. Keywords Generative Adversarial
Nets, Unsupervised Learning, Disentangled Representation Learning, Contrastive
Disentanglement, SSIM
( 2
min )
The discovery of neural architectures from simple building blocks is a
long-standing goal of Neural Architecture Search (NAS). Hierarchical search
spaces are a promising step towards this goal but lack a unifying search space
design framework and typically only search over some limited aspect of
architectures. In this work, we introduce a unifying search space design
framework based on context-free grammars that can naturally and compactly
generate expressive hierarchical search spaces that are 100s of orders of
magnitude larger than common spaces from the literature. By enhancing and using
their properties, we effectively enable search over the complete architecture
and can foster regularity. Further, we propose an efficient hierarchical kernel
design for a Bayesian Optimization search strategy to efficiently search over
such huge spaces. We demonstrate the versatility of our search space design
framework and show that our search strategy can be superior to existing NAS
approaches. Code is available at
https://github.com/automl/hierarchical_nas_construction.
( 2
min )
We’re proud to have 100+ accepted papers At NeurIPS 2023, plus 18 workshops. Several submissions were chosen as oral presentations and spotlight posters, reflecting groundbreaking concepts, methods, or applications. Here’s an overview of those submissions.
The post NeurIPS 2023 highlights breadth of Microsoft’s machine learning innovation appeared first on Microsoft Research.
( 16
min )
The series aims to help policymakers create better oversight of AI in society.
( 12
min )
In today’s digital marketing world, things are changing fast, and artificial intelligence (AI) is a big part of that. Companies want to stay ahead, so they’re smartly choosing to get help from outside experts in digital marketing who use AI tools. This helps them make the most of what AI can do. AI is like… Read More »Maximizing marketing potential: The AI-driven revolution in outsourced digital marketing
The post Maximizing marketing potential: The AI-driven revolution in outsourced digital marketing appeared first on Data Science Central.
( 22
min )
Much has been said about the economic impact of AGI, some of it is already been feltBut not much has been proposed about solutionsSpecifically, what approaches should policy makers take? Here, I propose that policy makers should encourage two key trends – together which could alleviate the issues of AI – The Gig economy and… Read More »Universal basic income and the gig economy: A combined policy approach to alleviate the challenges of AI
The post Universal basic income and the gig economy: A combined policy approach to alleviate the challenges of AI appeared first on Data Science Central.
( 21
min )
Great companies thrive on stories. Sid Siddeek, who runs NVIDIA’s venture capital arm, knows this well. Siddeek still remembers one of his first jobs, schlepping presentation materials from one investor meeting to another, helping the startup’s CEO and management team get the story out while working from a trailer that “shook when the door opened,” Read article >
( 7
min )
In part 1 of the series “A Different AI Scenario: AI and Justice in a Brave New World,” we outlined some requirements for the role that AI would play in enforcing our laws and regulations in a more just and fair manner and what our human legislators must do to ensure those more just and… Read More »AI and Justice in a Brave New World Part 2 – Humanizing AI
The post AI and Justice in a Brave New World Part 2 – Humanizing AI appeared first on Data Science Central.
( 22
min )
Finding classifiers robust to adversarial examples is critical for their safe
deployment. Determining the robustness of the best possible classifier under a
given threat model for a given data distribution and comparing it to that
achieved by state-of-the-art training methods is thus an important diagnostic
tool. In this paper, we find achievable information-theoretic lower bounds on
loss in the presence of a test-time attacker for multi-class classifiers on any
discrete dataset. We provide a general framework for finding the optimal 0-1
loss that revolves around the construction of a conflict hypergraph from the
data and adversarial constraints. We further define other variants of the
attacker-classifier game that determine the range of the optimal loss more
efficiently than the full-fledged hypergraph construction. Our evaluation
shows, for the first time, an analysis of the gap to optimal robustness for
classifiers in the multi-class setting on benchmark datasets.
( 2
min )
We explore colour versus shape goal misgeneralization originally demonstrated
by Di Langosco et al. (2022) in the Procgen Maze environment, where, given an
ambiguous choice, the agents seem to prefer generalization based on colour
rather than shape. After training over 1,000 agents in a simplified version of
the environment and evaluating them on over 10 million episodes, we conclude
that the behaviour can be attributed to the agents learning to detect the goal
object through a specific colour channel. This choice is arbitrary.
Additionally, we show how, due to underspecification, the preferences can
change when retraining the agents using exactly the same procedure except for
using a different random seed for the training run. Finally, we demonstrate the
existence of outliers in out-of-distribution behaviour based on training random
seed alone.
( 2
min )
The Classification Tree (CT) is one of the most common models in
interpretable machine learning. Although such models are usually built with
greedy strategies, in recent years, thanks to remarkable advances in
Mixer-Integer Programming (MIP) solvers, several exact formulations of the
learning problem have been developed. In this paper, we argue that some of the
most relevant ones among these training models can be encapsulated within a
general framework, whose instances are shaped by the specification of loss
functions and regularizers. Next, we introduce a novel realization of this
framework: specifically, we consider the logistic loss, handled in the MIP
setting by a linear piece-wise approximation, and couple it with
$\ell_1$-regularization terms. The resulting Optimal Logistic Tree model
numerically proves to be able to induce trees with enhanced interpretability
features and competitive generalization capabilities, compared to the
state-of-the-art MIP-based approaches.
( 2
min )
We report the effects of replacing the scaled dot-product (within softmax)
attention with the negative-log of Euclidean distance. This form of attention
simplifies to inverse distance weighting interpolation. Used in simple one
hidden layer networks and trained with vanilla cross-entropy loss on
classification problems, it tends to produce a key matrix containing prototypes
and a value matrix with corresponding logits. We also show that the resulting
interpretable networks can be augmented with manually-constructed prototypes to
perform low-impact handling of special cases.
( 2
min )
In this paper, we study the method to reconstruct dynamical systems from data
without time labels. Data without time labels appear in many applications, such
as molecular dynamics, single-cell RNA sequencing etc. Reconstruction of
dynamical system from time sequence data has been studied extensively. However,
these methods do not apply if time labels are unknown. Without time labels,
sequence data becomes distribution data. Based on this observation, we propose
to treat the data as samples from a probability distribution and try to
reconstruct the underlying dynamical system by minimizing the distribution
loss, sliced Wasserstein distance more specifically. Extensive experiment
results demonstrate the effectiveness of the proposed method.
( 2
min )
Sentiment analysis of social media data is an emerging field with vast
applications in various domains. In this study, we developed a sentiment
analysis model to analyze social media sentiment, especially tweets, during
global conflicting scenarios. To establish our research experiment, we
identified a recent global dispute incident on Twitter and collected around
31,000 filtered Tweets for several months to analyze human sentiment worldwide.
( 2
min )
A simple graph on $n$ vertices may contain a lot of maximum cliques. But how
many can it potentially contain? We will define prime and composite graphs, and
we will show that if $n \ge 15$, then the grpahs with the maximum number of
maximum cliques have to be composite. Moreover, we will show an edge bound from
which we will prove that if any factor of a composite graph has $\omega(G_i)
\ge 5$, then it cannot have the maximum number of maximum cliques. Using this
we will show that the graph that contains $3^{\lfloor n/3 \rfloor}c$ maximum
cliques has the most number of maximum cliques on $n$ vertices, where
$c\in\{1,\frac{4}{3},2\}$, depending on $n \text{ mod } 3$.
( 2
min )
We define and study a fully-convolutional neural network stochastic model,
NN-Turb, which generates a 1-dimensional field with some turbulent velocity
statistics. In particular, the generated process satisfies the Kolmogorov 2/3
law for second order structure function. It also presents negative skewness
across scales (i.e. Kolmogorov 4/5 law) and exhibits intermittency as
characterized by skewness and flatness. Furthermore, our model is never in
contact with turbulent data and only needs the desired statistical behavior of
the structure functions across scales for training.
( 2
min )
Multi-distribution learning generalizes the classic PAC learning to handle
data coming from multiple distributions. Given a set of $k$ data distributions
and a hypothesis class of VC dimension $d$, the goal is to learn a hypothesis
that minimizes the maximum population loss over $k$ distributions, up to
$\epsilon$ additive error. In this paper, we settle the sample complexity of
multi-distribution learning by giving an algorithm of sample complexity
$\widetilde{O}((d+k)\epsilon^{-2}) \cdot (k/\epsilon)^{o(1)}$. This matches the
lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem
of Awasthi, Haghtalab and Zhao [AHZ23].
( 2
min )
Reliable uncertainty quantification (UQ) in machine learning (ML) regression
tasks is becoming the focus of many studies in materials and chemical science.
It is now well understood that average calibration is insufficient, and most
studies implement additional methods testing the conditional calibration with
respect to uncertainty, i.e. consistency. Consistency is assessed mostly by
so-called reliability diagrams. There exists however another way beyond average
calibration, which is conditional calibration with respect to input features,
i.e. adaptivity. In practice, adaptivity is the main concern of the final users
of a ML-UQ method, seeking for the reliability of predictions and uncertainties
for any point in features space. This article aims to show that consistency and
adaptivity are complementary validation targets, and that a good consistency
does not imply a good adaptivity. Adapted validation methods are proposed and
illustrated on a representative example.
( 2
min )
We present a performant, general-purpose gradient-guided nested sampling
algorithm, ${\tt GGNS}$, combining the state of the art in differentiable
programming, Hamiltonian slice sampling, clustering, mode separation, dynamic
nested sampling, and parallelization. This unique combination allows ${\tt
GGNS}$ to scale well with dimensionality and perform competitively on a variety
of synthetic and real-world problems. We also show the potential of combining
nested sampling with generative flow networks to obtain large amounts of
high-quality samples from the posterior distribution. This combination leads to
faster mode discovery and more accurate estimates of the partition function.
( 2
min )
To tackle long planning horizon problems in reinforcement learning with
general function approximation, we propose the first algorithm, termed as
UCRL-WVTR, that achieves both \emph{horizon-free} and
\emph{instance-dependent}, since it eliminates the polynomial dependency on the
planning horizon. The derived regret bound is deemed \emph{sharp}, as it
matches the minimax lower bound when specialized to linear mixture MDPs up to
logarithmic factors. Furthermore, UCRL-WVTR is \emph{computationally efficient}
with access to a regression oracle. The achievement of such a horizon-free,
instance-dependent, and sharp regret bound hinges upon (i) novel algorithm
designs: weighted value-targeted regression and a high-order moment estimator
in the context of general function approximation; and (ii) fine-grained
analyses: a novel concentration bound of weighted non-linear least squares and
a refined analysis which leads to the tight instance-dependent bound. We also
conduct comprehensive experiments to corroborate our theoretical findings.
( 2
min )
In the era of fast-paced precision medicine, observational studies play a
major role in properly evaluating new treatments in clinical practice. Yet,
unobserved confounding can significantly compromise causal conclusions drawn
from non-randomized data. We propose a novel strategy that leverages randomized
trials to quantify unobserved confounding. First, we design a statistical test
to detect unobserved confounding with strength above a given threshold. Then,
we use the test to estimate an asymptotically valid lower bound on the
unobserved confounding strength. We evaluate the power and validity of our
statistical test on several synthetic and semi-synthetic datasets. Further, we
show how our lower bound can correctly identify the absence and presence of
unobserved confounding in a real-world setting.
( 2
min )
Inventory management is crucial for businesses, but it can be tedious. It can make or break a business, regardless of its age. AI has revolutionized business management and inventory control. AI can now do more than just follow instructions. It can analyze inventory history, predict customer behavior, and anticipate business needs. Want to know what… Read More »Harness the power of an AI-powered forecasting model to revitalize your business
The post Harness the power of an AI-powered forecasting model to revitalize your business appeared first on Data Science Central.
( 26
min )
Between the two of them, ChatGPT4 can generate the lyrics to Christmas carols, and DALL-E3 can illustrate them!
Throw your old carol books away because this is the only guide you'll need.
12 Days of Christmas
"Please generate an illustration where each of the 12 days'
( 3
min )
AI Weirdness: the strange side of machine learning
( 2
min )
MIT researchers develop a customized onboarding process that helps a human learn when a model’s advice is trustworthy.
( 11
min )
We introduce SwiftSage, a novel agent framework inspired by the dual-process
theory of human cognition, designed to excel in action planning for complex
interactive reasoning tasks. SwiftSage integrates the strengths of behavior
cloning and prompting large language models (LLMs) to enhance task completion
performance. The framework comprises two primary modules: the Swift module,
representing fast and intuitive thinking, and the Sage module, emulating
deliberate thought processes. The Swift module is a small encoder-decoder LM
fine-tuned on the oracle agent's action trajectories, while the Sage module
employs LLMs such as GPT-4 for subgoal planning and grounding. We develop a
heuristic method to harmoniously integrate the two modules, resulting in a more
efficient and robust problem-solving process. In 30 tasks from the ScienceWorld
benchmark, SwiftSage significantly outperforms other methods such as SayCan,
ReAct, and Reflexion, demonstrating its effectiveness in solving complex
interactive tasks.
( 2
min )
Maintenance work orders are commonly used to document information about wind
turbine operation and maintenance. This includes details about proactive and
reactive wind turbine downtimes, such as preventative and corrective
maintenance. However, the information contained in maintenance work orders is
often unstructured and difficult to analyze, presenting challenges for
decision-makers wishing to use it for optimizing operation and maintenance. To
address this issue, this work compares three different approaches to calculate
reliability by performance indicators from maintenance work orders. The first
approach involves manual labeling of the maintenance work orders by domain
experts, using the schema defined in an industrial guideline to assign the
label accordingly. The second approach involves the development of a model that
automatically labels the maintenance work orders using text classification
methods. Through this method, we are able to achieve macro average and weighted
average F1-Scores of 0.75 and 0.85 respectively. The third technique uses an
AI-assisted tagging tool to tag and structure the raw maintenance information,
together with a novel rule-based approach for extracting relevant maintenance
work orders for failure rate calculation. In our experiments the AI-assisted
tool leads to a 88% drop in tagging time in comparison to the other two
approaches, while expert labeling and text classification are more accurate in
KPI extraction. Overall, our findings make extracting maintenance information
from maintenance work orders more efficient, enable the assessment of
reliability key performance indicators and therefore support the optimization
of wind turbine operation and maintenance.
( 3
min )
Physics informed neural networks (PINNs) have recently been widely used for
robust and accurate approximation of PDEs. We provide rigorous upper bounds on
the generalization error of PINNs approximating solutions of the forward
problem for PDEs. An abstract formalism is introduced and stability properties
of the underlying PDE are leveraged to derive an estimate for the
generalization error in terms of the training error and number of training
samples. This abstract framework is illustrated with several examples of
nonlinear PDEs. Numerical experiments, validating the proposed theory, are also
presented.
( 2
min )
Recent advances in language models (LMs), have demonstrated significant
efficacy in tasks related to the arts and humanities. While LMs have exhibited
exceptional performance across a wide range of natural language processing
tasks, there are notable challenges associated with their utilization on small
datasets and their ability to replicate more creative human capacities. In this
study, we aim to address these challenges by training a Persian classical
poetry generation model using a transformer architecture on a specialized
dataset with no pretraining. Additionally, we propose a novel decoding method
to enhance coherence and meaningfulness in the generated poetry, effectively
managing the tradeoff between diversity and quality. Furthermore, the results
of our training approach and the proposed decoding method are evaluated through
comprehensive set of automatic and human evaluations and showed its superior
capability to generate coherent and meaningful poetry in compare to other
decoding methods and an existing Persian large language model (LLM).
( 2
min )
Knowledge graph construction (KGC) is a multifaceted undertaking involving
the extraction of entities, relations, and events. Traditionally, large
language models (LLMs) have been viewed as solitary task-solving agents in this
complex landscape. However, this paper challenges this paradigm by introducing
a novel framework, CooperKGC. Departing from the conventional approach,
CooperKGC establishes a collaborative processing network, assembling a KGC
collaboration team capable of concurrently addressing entity, relation, and
event extraction tasks. Our experiments unequivocally demonstrate that
fostering collaboration and information interaction among diverse agents within
CooperKGC yields superior results compared to individual cognitive processes
operating in isolation. Importantly, our findings reveal that the collaboration
facilitated by CooperKGC enhances knowledge selection, correction, and
aggregation capabilities across multiple rounds of interactions.
( 2
min )
Recent research on online Gradient Balancing (GraB) has revealed that there
exist permutation-based example orderings for SGD that are guaranteed to
outperform random reshuffling (RR). Whereas RR arbitrarily permutes training
examples, GraB leverages stale gradients from prior epochs to order examples --
achieving a provably faster convergence rate than RR. However, GraB is limited
by design: while it demonstrates an impressive ability to scale-up training on
centralized data, it does not naturally extend to modern distributed ML
workloads. We therefore propose Coordinated Distributed GraB (CD-GraB), which
uses insights from prior work on kernel thinning to translate the benefits of
provably faster permutation-based example ordering to distributed settings.
With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate
over centralized GraB and outperforms distributed RR on a variety of benchmark
tasks.
( 2
min )
When optimizing problems with uncertain parameter values in a linear
objective, decision-focused learning enables end-to-end learning of these
values. We are interested in a stochastic scheduling problem, in which
processing times are uncertain, which brings uncertain values in the
constraints, and thus repair of an initial schedule may be needed. Historical
realizations of the stochastic processing times are available. We show how
existing decision-focused learning techniques based on stochastic smoothing can
be adapted to this scheduling problem. We include an extensive experimental
evaluation to investigate in which situations decision-focused learning
outperforms the state of the art for such situations: scenario-based stochastic
optimization.
( 2
min )
Among the commonly used non-destructive techniques, the Ground Penetrating
Radar (GPR) is one of the most widely adopted today for assessing pavement
conditions in France. However, conventional radar systems and their forward
processing methods have shown their limitations for the physical and
geometrical characterization of very thin layers such as tack coats. However,
the use of Machine Learning methods applied to GPR with an inverse approach
showed that it was numerically possible to identify the tack coat
characteristics despite masking effects due to low timefrequency resolution
noted in the raw B-scans. Thus, we propose in this paper to apply the inverse
approach based on Machine Learning, already validated in previous works on
numerical data, on two experimental cases with different pavement structures.
The first case corresponds to a validation on known pavement structures on the
Gustave Eiffel University (Nantes, France) with its pavement fatigue carousel
and the second case focuses on a new real road in Vend{\'e}e department
(France). In both case studies, the performances of SVM/SVR methods showed the
efficiency of supervised learning methods to classify and estimate the emulsion
proportioning in the tack coats.
( 3
min )
This research introduces a sophisticated transfer learning model based on
Google's MobileNetV2 for breast cancer tumor classification into normal,
benign, and malignant categories, utilizing a dataset of 1576 ultrasound images
(265 normal, 891 benign, 420 malignant). The model achieves an accuracy of
0.82, precision of 0.83, recall of 0.81, ROC-AUC of 0.94, PR-AUC of 0.88, and
MCC of 0.74. It examines image intensity distributions and misclassification
errors, offering improvements for future applications. Addressing dataset
imbalances, the study ensures a generalizable model. This work, using a dataset
from Baheya Hospital, Cairo, Egypt, compiled by Walid Al-Dhabyani et al.,
emphasizes MobileNetV2's potential in medical imaging, aiming to improve
diagnostic precision in oncology. Additionally, the paper explores
Streamlit-based deployment for real-time tumor classification, demonstrating
MobileNetV2's applicability in medical imaging and setting a benchmark for
future research in oncology diagnostics.
( 2
min )
We study the asymptotic generalization of an overparameterized linear model
for multiclass classification under the Gaussian covariates bi-level model
introduced in Subramanian et al.~'22, where the number of data points,
features, and classes all grow together. We fully resolve the conjecture posed
in Subramanian et al.~'22, matching the predicted regimes for generalization.
Furthermore, our new lower bounds are akin to an information-theoretic strong
converse: they establish that the misclassification rate goes to 0 or 1
asymptotically. One surprising consequence of our tight results is that the
min-norm interpolating classifier can be asymptotically suboptimal relative to
noninterpolating classifiers in the regime where the min-norm interpolating
regressor is known to be optimal.
The key to our tight analysis is a new variant of the Hanson-Wright
inequality which is broadly useful for multiclass problems with sparse labels.
As an application, we show that the same type of analysis can be used to
analyze the related multilabel classification problem under the same bi-level
ensemble.
( 2
min )
Recent advances in machine learning, specifically transformer architecture,
have led to significant advancements in commercial domains. These powerful
models have demonstrated superior capability to learn complex relationships and
often generalize better to new data and problems. This paper presents a novel
transformer-powered approach for enhancing prediction accuracy in multi-modal
output scenarios, where sparse experimental data is supplemented with
simulation data. The proposed approach integrates transformer-based
architecture with a novel graph-based hyper-parameter optimization technique.
The resulting system not only effectively reduces simulation bias, but also
achieves superior prediction accuracy compared to the prior method. We
demonstrate the efficacy of our approach on inertial confinement fusion
experiments, where only 10 shots of real-world data are available, as well as
synthetic versions of these experiments.
( 2
min )
This paper engages in a speculative exploration of the concept of an
artificial agent capable of conducting research. Initially, it examines how the
act of research can be conceptually characterized, aiming to provide a starting
point for discussions about what it means to create such agents. The focus then
shifts to the core components of research: question formulation, hypothesis
generation, and hypothesis verification. This discussion includes a
consideration of the potential and challenges associated with enabling machines
to autonomously perform these tasks. Subsequently, this paper briefly considers
the overlapping themes and interconnections that underlie them. Finally, the
paper presents preliminary thoughts on prototyping as an initial step towards
uncovering the challenges involved in developing these research-capable agents.
( 2
min )
In this paper, we propose a dimensionless anomaly detection method for
multivariate streams. Our method is independent of the unit of measurement for
the different stream channels, therefore dimensionless. We first propose the
variance norm, a generalisation of Mahalanobis distance to handle
infinite-dimensional feature space and singular empirical covariance matrix
rigorously. We then combine the variance norm with the path signature, an
infinite collection of iterated integrals that provide global features of
streams, to propose SigMahaKNN, a method for anomaly detection on
(multivariate) streams. We show that SigMahaKNN is invariant to stream
reparametrisation, stream concatenation and has a graded discrimination power
depending on the truncation level of the path signature. We implement
SigMahaKNN as an open-source software, and perform extensive numerical
experiments, showing significantly improved anomaly detection on streams
compared to isolation forest and local outlier factors in applications ranging
from language analysis, hand-writing analysis, ship movement paths analysis and
univariate time-series analysis.
( 2
min )
Algorithms make a growing portion of policy and business decisions. We
develop a treatment-effect estimator using algorithmic decisions as instruments
for a class of stochastic and deterministic algorithms. Our estimator is
consistent and asymptotically normal for well-defined causal effects. A special
case of our setup is multidimensional regression discontinuity designs with
complex boundaries. We apply our estimator to evaluate the Coronavirus Aid,
Relief, and Economic Security Act, which allocated many billions of dollars
worth of relief funding to hospitals via an algorithmic rule. The funding is
shown to have little effect on COVID-19-related hospital activities. Naive
estimates exhibit selection bias.
( 2
min )
There has been a lot of work in question generation where different methods
to provide target answers as input, have been employed. This experimentation
has been mostly carried out for RNN based models. We use three different
methods and their combinations for incorporating answer information and explore
their effect on several automatic evaluation metrics. The methods that are used
are answer prompting, using a custom product method using answer embeddings and
encoder outputs, choosing sentences from the input paragraph that have answer
related information, and using a separate cross-attention attention block in
the decoder which attends to the answer. We observe that answer prompting
without any additional modes obtains the best scores across rouge, meteor
scores. Additionally, we use a custom metric to calculate how many of the
generated questions have the same answer, as the answer which is used to
generate them.
( 2
min )
We present a robust membership inference attack (RMIA) that amplifies the
distinction between population data and the training data on any target model,
by effectively leveraging both reference models and reference data in our
likelihood ratio test. Our algorithm exhibits superior test power
(true-positive rate) when compared to prior methods, even at extremely low
false-positive error rates (as low as 0). Also, under computation constraints,
where only a limited number of reference models (as few as 1) are available,
our method performs exceptionally well, unlike some prior attacks that approach
random guessing in such scenarios. Our method lays the groundwork for
cost-effective and practical yet powerful and robust privacy risk analysis of
machine learning algorithms.
( 2
min )
Among the commonly used non-destructive techniques, the Ground Penetrating
Radar (GPR) is one of the most widely adopted today for assessing pavement
conditions in France. However, conventional radar systems and their forward
processing methods have shown their limitations for the physical and
geometrical characterization of very thin layers such as tack coats. However,
the use of Machine Learning methods applied to GPR with an inverse approach
showed that it was numerically possible to identify the tack coat
characteristics despite masking effects due to low timefrequency resolution
noted in the raw B-scans. Thus, we propose in this paper to apply the inverse
approach based on Machine Learning, already validated in previous works on
numerical data, on two experimental cases with different pavement structures.
The first case corresponds to a validation on known pavement structures on the
Gustave Eiffel University (Nantes, France) with its pavement fatigue carousel
and the second case focuses on a new real road in Vend{\'e}e department
(France). In both case studies, the performances of SVM/SVR methods showed the
efficiency of supervised learning methods to classify and estimate the emulsion
proportioning in the tack coats.
( 3
min )
Algorithms make a growing portion of policy and business decisions. We
develop a treatment-effect estimator using algorithmic decisions as instruments
for a class of stochastic and deterministic algorithms. Our estimator is
consistent and asymptotically normal for well-defined causal effects. A special
case of our setup is multidimensional regression discontinuity designs with
complex boundaries. We apply our estimator to evaluate the Coronavirus Aid,
Relief, and Economic Security Act, which allocated many billions of dollars
worth of relief funding to hospitals via an algorithmic rule. The funding is
shown to have little effect on COVID-19-related hospital activities. Naive
estimates exhibit selection bias.
( 2
min )
In this paper, we propose a dimensionless anomaly detection method for
multivariate streams. Our method is independent of the unit of measurement for
the different stream channels, therefore dimensionless. We first propose the
variance norm, a generalisation of Mahalanobis distance to handle
infinite-dimensional feature space and singular empirical covariance matrix
rigorously. We then combine the variance norm with the path signature, an
infinite collection of iterated integrals that provide global features of
streams, to propose SigMahaKNN, a method for anomaly detection on
(multivariate) streams. We show that SigMahaKNN is invariant to stream
reparametrisation, stream concatenation and has a graded discrimination power
depending on the truncation level of the path signature. We implement
SigMahaKNN as an open-source software, and perform extensive numerical
experiments, showing significantly improved anomaly detection on streams
compared to isolation forest and local outlier factors in applications ranging
from language analysis, hand-writing analysis, ship movement paths analysis and
univariate time-series analysis.
( 2
min )
In causal models, a given mechanism is assumed to be invariant to changes of
other mechanisms. While this principle has been utilized for inference in
settings where the causal variables are observed, theoretical insights when the
variables of interest are latent are largely missing. We assay the connection
between invariance and causal representation learning by establishing
impossibility results which show that invariance alone is insufficient to
identify latent causal variables. Together with practical considerations, we
use these theoretical findings to highlight the need for additional constraints
in order to identify representations by exploiting invariance.
( 2
min )
Associated to each graph G is a Gaussian graphical model. Such models are
often used in high-dimensional settings, i.e. where there are relatively few
data points compared to the number of variables. The maximum likelihood
threshold of a graph is the minimum number of data points required to fit the
corresponding graphical model using maximum likelihood estimation. Graphical
lasso is a method for selecting and fitting a graphical model. In this project,
we ask: when graphical lasso is used to select and fit a graphical model on n
data points, how likely is it that n is greater than or equal to the maximum
likelihood threshold of the corresponding graph? Our results are a series of
computational experiments.
( 2
min )
The partially observable constrained optimization problems (POCOPs) impede
data-driven optimization techniques since an infeasible solution of POCOPs can
provide little information about the objective as well as the constraints. We
endeavor to design an efficient and provable method for expensive POCOPs under
the framework of constrained Bayesian optimization. Our method consists of two
key components. Firstly, we present an improved design of the acquisition
functions that introduces balanced exploration during optimization. We
rigorously study the convergence properties of this design to demonstrate its
effectiveness. Secondly, we propose a Gaussian process embedding different
likelihoods as the surrogate model for a partially observable constraint. This
model leads to a more accurate representation of the feasible regions compared
to traditional classification-based models. Our proposed method is empirically
studied on both synthetic and real-world problems. The results demonstrate the
competitiveness of our method for solving POCOPs.
( 2
min )
The central problem in materials science is to discover materials with desired properties. MatterGen enables broad property-guided materials design.
The post MatterGen: Property-guided materials design appeared first on Microsoft Research.
( 8
min )
Advanced prompting technologies for LLMs can lead to excessively long prompts, causing issues. Learn how LLMLingua compresses prompts up to 20x, maintaining quality, reducing latency, and supporting improved UX.
The post LLMLingua: Innovating LLM efficiency with prompt compression appeared first on Microsoft Research.
( 10
min )
Accessibility is a key element that all designers must consider before constructing a space or product — but the evaluation process has traditionally been tedious and time-consuming. Mathew Schwartz, an assistant professor in architecture and design at the New Jersey Institute of Technology, is using the NVIDIA Omniverse platform and the Universal Scene Description framework, Read article >
( 7
min )
It’s a fortuitous GFN Thursday with 17 new games joining the GeForce NOW library, including The Day Before, Avatar: Frontiers of Pandora and the 100th PC Game Pass title to join the cloud — Ori and the Will of the Wisps. This week also marks a milestone: over 500 games and applications now support RTX Read article >
( 8
min )
This is a guest post co-authored by Nafi Ahmet Turgut, Mehmet İkbal Özmen, Hasan Burak Yel, Fatma Nur Dumlupınar Keşir, Mutlu Polatcan and Emre Uzel from Getir. Getir is the pioneer of ultrafast grocery delivery. The technology company has revolutionized last-mile delivery with its grocery in-minutes delivery proposition. Getir was founded in 2015 and operates […]
( 8
min )
Using machine learning, the computational method can provide details of how materials work as catalysts, semiconductors, or battery components.
( 11
min )
Double descent presents a counter-intuitive aspect within the machine
learning domain, and researchers have observed its manifestation in various
models and tasks. While some theoretical explanations have been proposed for
this phenomenon in specific contexts, an accepted theory to account for its
occurrence in deep learning remains yet to be established. In this study, we
revisit the phenomenon of double descent and demonstrate that its occurrence is
strongly influenced by the presence of noisy data. Through conducting a
comprehensive analysis of the feature space of learned representations, we
unveil that double descent arises in imperfect models trained with noisy data.
We argue that double descent is a consequence of the model first learning the
noisy data until interpolation and then adding implicit regularization via
over-parameterization acquiring therefore capability to separate the
information from the noise.
( 2
min )
Adopting reasonable strategies is challenging but crucial for an intelligent
agent with limited resources working in hazardous, unstructured, and dynamic
environments to improve the system's utility, decrease the overall cost, and
increase mission success probability. This paper proposes a novel directed
acyclic strategy graph decomposition approach based on Bayesian chaining to
separate an intricate policy into several simple sub-policies and organize
their relationships as Bayesian strategy networks (BSN). We integrate this
approach into the state-of-the-art DRL method -- soft actor-critic (SAC), and
build the corresponding Bayesian soft actor-critic (BSAC) model by organizing
several sub-policies as a joint policy. We compare our method against the
state-of-the-art deep reinforcement learning algorithms on the standard
continuous control benchmarks in the OpenAI Gym environment. The results
demonstrate that the promising potential of the BSAC method significantly
improves training efficiency.
( 2
min )
Computational pathology models rarely utilise data that will not be available
for inference. This means most models cannot learn from highly informative data
such as additional immunohistochemical (IHC) stains and spatial
transcriptomics. We present TriDeNT, a novel self-supervised method for
utilising privileged data that is not available during inference to improve
performance. We demonstrate the efficacy of this method for a range of
different paired data including immunohistochemistry, spatial transcriptomics
and expert nuclei annotations. In all settings, TriDeNT outperforms other
state-of-the-art methods in downstream tasks, with observed improvements of up
to 101%. Furthermore, we provide qualitative and quantitative measurements of
the features learned by these models and how they differ from baselines.
TriDeNT offers a novel method to distil knowledge from scarce or costly data
during training, to create significantly better models for routine inputs.
( 2
min )
Guaranteeing safe behaviour of reinforcement learning (RL) policies poses
significant challenges for safety-critical applications, despite RL's
generality and scalability. To address this, we propose a new approach to apply
verification methods from control theory to learned value functions. By
analyzing task structures for safety preservation, we formalize original
theorems that establish links between value functions and control barrier
functions. Further, we propose novel metrics for verifying value functions in
safe control tasks and practical implementation details to improve learning.
Our work presents a novel method for certificate learning, which unlocks a
diversity of verification techniques from control theory for RL policies, and
marks a significant step towards a formal framework for the general, scalable,
and verifiable design of RL-based control systems. Code and videos are
available at this https url: https://rl-cbf.github.io/
( 2
min )
Physics-informed neural networks (PINNs) constitute a flexible approach to
both finding solutions and identifying parameters of partial differential
equations. Most works on the topic assume noiseless data, or data contaminated
with weak Gaussian noise. We show that the standard PINN framework breaks down
in case of non-Gaussian noise. We give a way of resolving this fundamental
issue and we propose to jointly train an energy-based model (EBM) to learn the
correct noise distribution. We illustrate the improved performance of our
approach using multiple examples.
( 2
min )
In this paper, we prove that an Adam-type algorithm with smooth clipping
approaches the global minimizer of the regularized non-convex loss function.
Adding smooth clipping and taking the state space as the set of all
trajectories, we can apply the ergodic theory of Markov semigroups for this
algorithm and investigate its asymptotic behavior. The ergodic theory we
establish in this paper reduces the problem of evaluating the convergence,
generalization error and discretization error of this algorithm to the problem
of evaluating the difference between two functional stochastic differential
equations (SDEs) with different drift coefficients. As a result of our
analysis, we have shown that this algorithm minimizes the the regularized
non-convex loss function with errors of the form $n^{-1/2}$, $\eta^{1/4}$,
$\beta^{-1} \log (\beta + 1)$ and $e^{- c t}$. Here, $c$ is a constant and $n$,
$\eta$, $\beta$ and $t$ denote the size of the training dataset, learning rate,
inverse temperature and time, respectively.
( 2
min )
Knowledge tracing consists in predicting the performance of some students on
new questions given their performance on previous questions, and can be a prior
step to optimizing assessment and learning. Deep knowledge tracing (DKT) is a
competitive model for knowledge tracing relying on recurrent neural networks,
even if some simpler models may match its performance. However, little is known
about why DKT works so well. In this paper, we frame deep knowledge tracing as
a encoderdecoder architecture. This viewpoint not only allows us to propose
better models in terms of performance, simplicity or expressivity but also
opens up promising avenues for future research directions. In particular, we
show on several small and large datasets that a simpler decoder, with possibly
fewer parameters than the one used by DKT, can predict student performance
better.
( 2
min )
Deep Learning(DL) and Machine Learning(ML) applications are rapidly
increasing in recent days. Massive amounts of data are being generated over the
internet which can derive meaningful results by the use of ML and DL
algorithms. Hardware resources and open-source libraries have made it easy to
implement these algorithms. Tensorflow and Pytorch are one of the leading
frameworks for implementing ML projects. By using those frameworks, we can
trace the operations executed on both GPU and CPU to analyze the resource
allocations and consumption. This paper presents the time and memory allocation
of CPU and GPU while training deep neural networks using Pytorch. This paper
analysis shows that GPU has a lower running time as compared to CPU for deep
neural networks. For a simpler network, there are not many significant
improvements in GPU over the CPU.
( 2
min )
The effectiveness of a model is heavily reliant on the quality of the fusion
representation of multiple modalities in multimodal sentiment analysis.
Moreover, each modality is extracted from raw input and integrated with the
rest to construct a multimodal representation. Although previous methods have
proposed multimodal representations and achieved promising results, most of
them focus on forming positive and negative pairs, neglecting the variation in
sentiment scores within the same class. Additionally, they fail to capture the
significance of unimodal representations in the fusion vector. To address these
limitations, we introduce a framework called Supervised Angular-based
Contrastive Learning for Multimodal Sentiment Analysis. This framework aims to
enhance discrimination and generalizability of the multimodal representation
and overcome biases in the fusion vector's modality. Our experimental results,
along with visualizations on two widely used datasets, demonstrate the
effectiveness of our approach.
( 2
min )
We discuss the fundamental issue of identification in linear instrumental
variable (IV) models with unknown IV validity. With the assumption of the
"sparsest rule", which is equivalent to the plurality rule but becomes
operational in computation algorithms, we investigate and prove the advantages
of non-convex penalized approaches over other IV estimators based on two-step
selections, in terms of selection consistency and accommodation for
individually weak IVs. Furthermore, we propose a surrogate sparsest penalty
that aligns with the identification condition and provides oracle sparse
structure simultaneously. Desirable theoretical properties are derived for the
proposed estimator with weaker IV strength conditions compared to the previous
literature. Finite sample properties are demonstrated using simulations and the
selection and estimation method is applied to an empirical study concerning the
effect of BMI on diastolic blood pressure.
( 2
min )
Most neural compression models are trained on large datasets of images or
videos in order to generalize to unseen data. Such generalization typically
requires large and expressive architectures with a high decoding complexity.
Here we introduce C3, a neural compression method with strong rate-distortion
(RD) performance that instead overfits a small model to each image or video
separately. The resulting decoding complexity of C3 can be an order of
magnitude lower than neural baselines with similar RD performance. C3 builds on
COOL-CHIC (Ladune et al.) and makes several simple and effective improvements
for images. We further develop new methodology to apply C3 to videos. On the
CLIC2020 image benchmark, we match the RD performance of VTM, the reference
implementation of the H.266 codec, with less than 3k MACs/pixel for decoding.
On the UVG video benchmark, we match the RD performance of the Video
Compression Transformer (Mentzer et al.), a well-established neural video
codec, with less than 5k MACs/pixel for decoding.
( 2
min )
This paper presents a method for finding a sparse representation of Barron
functions. Specifically, given an $L^2$ function $f$, the inverse scale space
flow is used to find a sparse measure $\mu$ minimising the $L^2$ loss between
the Barron function associated to the measure $\mu$ and the function $f$. The
convergence properties of this method are analysed in an ideal setting and in
the cases of measurement noise and sampling bias. In an ideal setting the
objective decreases strictly monotone in time to a minimizer with
$\mathcal{O}(1/t)$, and in the case of measurement noise or sampling bias the
optimum is achieved up to a multiplicative or additive constant. This
convergence is preserved on discretization of the parameter space, and the
minimizers on increasingly fine discretizations converge to the optimum on the
full parameter space.
( 2
min )
Physics-informed neural networks (PINNs) constitute a flexible approach to
both finding solutions and identifying parameters of partial differential
equations. Most works on the topic assume noiseless data, or data contaminated
with weak Gaussian noise. We show that the standard PINN framework breaks down
in case of non-Gaussian noise. We give a way of resolving this fundamental
issue and we propose to jointly train an energy-based model (EBM) to learn the
correct noise distribution. We illustrate the improved performance of our
approach using multiple examples.
( 2
min )
The Street View House Numbers (SVHN) dataset is a popular benchmark dataset
in deep learning. Originally designed for digit classification tasks, the SVHN
dataset has been widely used as a benchmark for various other tasks including
generative modeling. However, with this work, we aim to warn the community
about an issue of the SVHN dataset as a benchmark for generative modeling
tasks: we discover that the official split into training set and test set of
the SVHN dataset are not drawn from the same distribution. We empirically show
that this distribution mismatch has little impact on the classification task
(which may explain why this issue has not been detected before), but it
severely affects the evaluation of probabilistic generative models, such as
Variational Autoencoders and diffusion models. As a workaround, we propose to
mix and re-split the official training and test set when SVHN is used for tasks
other than classification. We publish a new split and the indices we used to
create it at https://jzenn.github.io/svhn-remix/ .
( 2
min )
Toronto Pearson International Airport, in Ontario, Canada, is the country’s largest and busiest airport, serving some 50 million passengers each year. To enhance traveler experiences, the airport in June deployed the Zensors AI platform, which uses anonymized footage from existing security cameras to generate spatial data that helps optimize operations in real time. A member Read article >
( 7
min )
Move over, Merriam-Webster: Enterprises this year found plenty of candidates to add for word of the year. “Generative AI” and “generative pretrained transformer” were followed by terms such as “large language models” and “retrieval-augmented generation” (RAG) as whole industries turned their attention to transformative new technologies. Generative AI started the year as a blip on Read article >
( 17
min )
A new era of autonomous vehicle technology, known as AV 2.0, has emerged, marked by large, unified AI models that can control multiple parts of the vehicle stack, from perception and planning to control. Wayve, a London-based autonomous driving technology company, is leading the surf. In the latest episode of NVIDIA’s AI Podcast, host Katie Read article >
( 6
min )
Despite the seemingly unstoppable adoption of LLMs across industries, they are one component of a broader technology ecosystem that is powering the new AI wave. Many conversational AI use cases require LLMs like Llama 2, Flan T5, and Bloom to respond to user queries. These models rely on parametric knowledge to answer questions. The model […]
( 11
min )
Summarization is the technique of condensing sizable information into a compact and meaningful form, and stands as a cornerstone of efficient communication in our information-rich age. In a world full of data, summarizing long texts into brief summaries saves time and helps make informed decisions. Summarization condenses content, saving time and improving clarity by presenting […]
( 13
min )
Conversational AI has come a long way in recent years thanks to the rapid developments in generative AI, especially the performance improvements of large language models (LLMs) introduced by training techniques such as instruction fine-tuning and reinforcement learning from human feedback. When prompted correctly, these models can carry coherent conversations without any task-specific training data. […]
( 18
min )
This post is co-written with Stanislav Yeshchenko from Q4 Inc. Enterprises turn to Retrieval Augmented Generation (RAG) as a mainstream approach to building Q&A chatbots. We continue to see emerging challenges stemming from the nature of the assortment of datasets available. These datasets are often a mix of numerical and text data, at times structured, […]
( 18
min )
Explore the latest AI innovations aiming to advance the software development lifecycle. AdaptivePaste adapts and refines pasted code snippets in an IDE. InferFix automates bug detection and repair. Discover how.
The post Microsoft at ESEC/FSE 2023: AI techniques for a streamlined coding workflow appeared first on Microsoft Research.
( 10
min )
Research Focus: Using LLMs in a Rust-based formal verification framework; Rethinking network measurements with user feedback; 3D telemedicine using HoloportationTM communication technology could enhance overseas surgical visits.
The post Research Focus: Week of December 4, 2023 appeared first on Microsoft Research.
( 9
min )
During 18 years of leadership, Evans established new R&D mission areas, strengthened ties to the MIT community, and increased inclusion and education efforts.
( 11
min )
The data-driven approach to robot control has been gathering pace rapidly,
yet generalization to unseen task domains remains a critical challenge. We
argue that the key to generalization is representations that are (i) rich
enough to capture all task-relevant information and (ii) invariant to
superfluous variability between the training and the test domains. We
experimentally study such a representation -- containing both depth and
semantic information -- for visual navigation and show that it enables a
control policy trained entirely in simulated indoor scenes to generalize to
diverse real-world environments, both indoors and outdoors. Further, we show
that our representation reduces the A-distance between the training and test
domains, improving the generalization error bound as a result. Our proposed
approach is scalable: the learned policy improves continuously, as the
foundation models that it exploits absorb more diverse data during
pre-training.
( 2
min )
Denoising is intuitively related to projection. Indeed, under the manifold
hypothesis, adding random noise is approximately equivalent to orthogonal
perturbation. Hence, learning to denoise is approximately learning to project.
In this paper, we use this observation to reinterpret denoising diffusion
models as approximate gradient descent applied to the Euclidean distance
function. We then provide straight-forward convergence analysis of the DDIM
sampler under simple assumptions on the projection-error of the denoiser.
Finally, we propose a new sampler based on two simple modifications to DDIM
using insights from our theoretical results. In as few as 5-10 function
evaluations, our sampler achieves state-of-the-art FID scores on pretrained
CIFAR-10 and CelebA models and can generate high quality samples on latent
diffusion models.
( 2
min )
This paper proposes a multiblock alternating direction method of multipliers
for solving a class of multiblock nonsmooth nonconvex optimization problem with
nonlinear coupling constraints. We employ a majorization minimization procedure
in the update of each block of the primal variables. Subsequential and global
convergence of the generated sequence to a critical point of the augmented
Lagrangian are proved. We also establish iteration complexity and provide
preliminary numerical results for the proposed algorithm.
( 2
min )
Signal Temporal Logic (STL) is a powerful framework for describing the
complex temporal and logical behaviour of the dynamical system. Numerous
studies have attempted to employ reinforcement learning to learn a controller
that enforces STL specifications; however, they have been unable to effectively
tackle the challenges of ensuring robust satisfaction in continuous state space
and maintaining tractability. In this paper, leveraging the concept of funnel
functions, we propose a tractable reinforcement learning algorithm to learn a
time-dependent policy for robust satisfaction of STL specification in
continuous state space. We demonstrate the utility of our approach on several
STL tasks using different environments.
( 2
min )
Hippocampal atrophy in Alzheimer's disease (AD) is asymmetric and spatially
inhomogeneous. While extensive work has been done on volume and shape analysis
of atrophy of the hippocampus in AD, less attention has been given to
hippocampal asymmetry specifically. Previous studies of hippocampal asymmetry
are limited to global volume or shape measures, which don't localize shape
asymmetry at the point level. In this paper, we propose to quantify localized
shape asymmetry by optimizing point correspondences between left and right
hippocampi within a subject, while simultaneously favoring a compact
statistical shape model of the entire sample. To account for related variables
that have impact on AD and healthy subject differences, we build linear models
with other confounding factors. Our results on the OASIS3 dataset demonstrate
that compared to using volumetric information, shape asymmetry reveals
fine-grained, localized differences that indicate the hippocampal regions of
most significant shape asymmetry in AD patients.
( 2
min )
This work introduces BRILLsson, a novel binary neural network-based
representation learning model for a broad range of non-semantic speech tasks.
We train the model with knowledge distillation from a large and real-valued
TRILLsson model with only a fraction of the dataset used to train TRILLsson.
The resulting BRILLsson models are only 2MB in size with a latency less than
8ms, making them suitable for deployment in low-resource devices such as
wearables. We evaluate BRILLsson on eight benchmark tasks (including but not
limited to spoken language identification, emotion recognition, health
condition diagnosis, and keyword spotting), and demonstrate that our proposed
ultra-light and low-latency models perform as well as large-scale models.
( 2
min )
This paper proposes a weakly-supervised machine learning-based approach
aiming at a tool to alert patients about possible respiratory diseases. Various
types of pathologies may affect the respiratory system, potentially leading to
severe diseases and, in certain cases, death. In general, effective prevention
practices are considered as major actors towards the improvement of the
patient's health condition. The proposed method strives to realize an easily
accessible tool for the automatic diagnosis of respiratory diseases.
Specifically, the method leverages Variational Autoencoder architectures
permitting the usage of training pipelines of limited complexity and relatively
small-sized datasets. Importantly, it offers an accuracy of 57 %, which is in
line with the existing strongly-supervised approaches.
( 2
min )
Information Extraction (IE) seeks to derive structured information from
unstructured texts, often facing challenges in low-resource scenarios due to
data scarcity and unseen classes. This paper presents a review of neural
approaches to low-resource IE from \emph{traditional} and \emph{LLM-based}
perspectives, systematically categorizing them into a fine-grained taxonomy.
Then we conduct empirical study on LLM-based methods compared with previous
state-of-the-art models, and discover that (1) well-tuned LMs are still
predominant; (2) tuning open-resource LLMs and ICL with GPT family is promising
in general; (3) the optimal LLM-based technical solution for low-resource IE
can be task-dependent. In addition, we discuss low-resource IE with LLMs,
highlight promising applications, and outline potential research directions.
This survey aims to foster understanding of this field, inspire new ideas, and
encourage widespread applications in both academia and industry.
( 2
min )
Since ChatGPT works so well, are we on the cusp of solving science with AI?
Is not AlphaFold2 suggestive that the potential of LLMs in biology and the
sciences more broadly is limitless? Can we use AI itself to bridge the lack of
data in the sciences in order to then train an AI? Herein we present a
discussion of these topics.
( 2
min )
When visualizing a high-dimensional dataset, dimension reduction techniques
are commonly employed which provide a single 2 dimensional view of the data. We
describe ENS-t-SNE: an algorithm for Embedding Neighborhoods Simultaneously
that generalizes the t-Stochastic Neighborhood Embedding approach. By using
different viewpoints in ENS-t-SNE's 3D embedding, one can visualize different
types of clusters within the same high-dimensional dataset. This enables the
viewer to see and keep track of the different types of clusters, which is
harder to do when providing multiple 2D embeddings, where corresponding points
cannot be easily identified. We illustrate the utility of ENS-t-SNE with
real-world applications and provide an extensive quantitative evaluation with
datasets of different types and sizes.
( 2
min )
Traditional partial differential equation (PDE) solvers can be
computationally expensive, which motivates the development of faster methods,
such as reduced-order-models (ROMs). We present GPLaSDI, a hybrid deep-learning
and Bayesian ROM. GPLaSDI trains an autoencoder on full-order-model (FOM) data
and simultaneously learns simpler equations governing the latent space. These
equations are interpolated with Gaussian Processes, allowing for uncertainty
quantification and active learning, even with limited access to the FOM solver.
Our framework is able to achieve up to 100,000 times speed-up and less than 7%
relative error on fluid mechanics problems.
( 2
min )
Training neural networks that require adversarial optimization, such as
generative adversarial networks (GANs) and unsupervised domain adaptations
(UDAs), suffers from instability. This instability problem comes from the
difficulty of the minimax optimization, and there have been various approaches
in GANs and UDAs to overcome this problem. In this study, we tackle this
problem theoretically through a functional analysis. Specifically, we show the
convergence property of the minimax problem by the gradient descent over the
infinite-dimensional spaces of continuous functions and probability measures
under certain conditions. Using this setting, we can discuss GANs and UDAs
comprehensively, which have been studied independently. In addition, we show
that the conditions necessary for the convergence property are interpreted as
stabilization techniques of adversarial training such as the spectral
normalization and the gradient penalty.
( 2
min )
Normative models in neuroimaging learn the brain patterns of healthy
population distribution and estimate how disease subjects like Alzheimer's
Disease (AD) deviate from the norm. Existing variational autoencoder
(VAE)-based normative models using multimodal neuroimaging data aggregate
information from multiple modalities by estimating product or averaging of
unimodal latent posteriors. This can often lead to uninformative joint latent
distributions which affects the estimation of subject-level deviations. In this
work, we addressed the prior limitations by adopting the
Mixture-of-Product-of-Experts (MoPoE) technique which allows better modelling
of the joint latent posterior. Our model labelled subjects as outliers by
calculating deviations from the multimodal latent space. Further, we identified
which latent dimensions and brain regions were associated with abnormal
deviations due to AD pathology.
( 2
min )
In 2023, online payment fraud cost the world US$48 billion. Businesses prioritize fighting payment fraud and minimizing its financial and reputational damage. In addition to monetary losses, payment fraud can damage a customer’s trust and loyalty, as well as increase the scrutiny from regulators and law enforcement. Organizations use machine learning to combat this growing… Read More »Decoding the Future: The Intersection of Advanced Analytics and Fraud Prevention in Revolutionizing Digital Payments
The post Decoding the Future: The Intersection of Advanced Analytics and Fraud Prevention in Revolutionizing Digital Payments appeared first on Data Science Central.
( 22
min )
Large language model (LLM) training has become increasingly popular over the last year with the release of several publicly available models such as Llama2, Falcon, and StarCoder. Customers are now training LLMs of unprecedented size ranging from 1 billion to over 175 billion parameters. Training these LLMs requires significant compute resources and time as hundreds […]
( 8
min )
Structured data, defined as data following a fixed pattern such as information stored in columns within databases, and unstructured data, which lacks a specific form or pattern like text, images, or social media posts, both continue to grow as they are produced and consumed by various organizations. For instance, according to International Data Corporation (IDC), […]
( 13
min )
The post describes how you can overcome the challenges of retaining data ownership and preserving data privacy while using LLMs by deploying Protopia AI’s Stained Glass Transform to protect your data. Protopia AI has partnered with AWS to deliver the critical component of data protection and ownership for secure and efficient enterprise adoption of generative AI. This post outlines the solution and demonstrates how it can be used in AWS for popular enterprise use cases like Retrieval Augmented Generation (RAG) and with state-of-the-art LLMs like Llama 2.
( 12
min )
Many patients in low- and middle-income countries rely on facilitated online health communities for information and support. Discover how large language models can assist the facilitators and boost outcomes.
The post Exploring LLMs’ potential to help facilitators enhance online healthcare communities appeared first on Microsoft Research.
( 10
min )
Cecily Morrison and Karolina Pakėnaitė are collaborators on a research prototype designed to help members of the blind community find their personal items. Learn how the work is advancing an approach to empower people to shape their own AI experiences.
The post Collaborators: Teachable AI with Cecily Morrison and Karolina Pakėnaitė appeared first on Microsoft Research.
( 28
min )
‘Tis the season for friends, family and beautifully rendered Santa animations from this week’s In the NVIDIA Studio artist, 3D expert Božo Balov.
( 7
min )
A new, data-driven approach could lead to better solutions for tricky optimization problems like global package routing or power grid operation.
( 9
min )
Based on the standard VMAF implementation we propose an implementation of
VMAF using PyTorch framework. For this implementation comparisons with the
standard (libvmaf) show the discrepancy $\lesssim 10^{-2}$ in VMAF units. We
investigate gradients computation when using VMAF as an objective function and
demonstrate that training using this function does not result in ill-behaving
gradients. The implementation is then used to train a preprocessing filter. It
is demonstrated that its performance is superior to the unsharp masking filter.
The resulting filter is also easy for implementation and can be applied in
video processing tasks for video copression improvement. This is confirmed by
the results of numerical experiments.
( 2
min )
We consider a setting where a population of artificial learners is given, and
the objective is to optimize aggregate measures of performance, under
constraints on training resources. The problem is motivated by the study of
peer learning in human educational systems. In this context, we study natural
knowledge diffusion processes in networks of interacting artificial learners.
By `natural', we mean processes that reflect human peer learning where the
students' internal state and learning process is mostly opaque, and the main
degree of freedom lies in the formation of peer learning groups by a
coordinator who can potentially evaluate the learners before assigning them to
peer groups. Among else, we empirically show that such processes indeed make
effective use of the training resources, and enable the design of modular
neural models that have the capacity to generalize without being prone to
overfitting noisy labels.
( 2
min )
In this paper we consider the numerical solution to the soft-margin support
vector machine optimization problem. This problem is typically solved using the
SMO algorithm, given the high computational complexity of traditional
optimization algorithms when dealing with large-scale kernel matrices. In this
work, we propose employing an NFFT-accelerated matrix-vector product using an
ANOVA decomposition for the feature space that is used within an interior point
method for the overall optimization problem. As this method requires the
solution of a linear system of saddle point form we suggest a preconditioning
approach that is based on low-rank approximations of the kernel matrix together
with a Krylov subspace solver. We compare the accuracy of the ANOVA-based
kernel with the default LIBSVM implementation. We investigate the performance
of the different preconditioners as well as the accuracy of the ANOVA kernel on
several large-scale datasets.
( 2
min )
In this paper, we aim to explore the use of uplink semantic communications
with the assistance of UAV in order to improve data collection effiicency for
metaverse users in remote areas. To reduce the time for uplink data collection
while balancing the trade-off between reconstruction quality and computational
energy cost, we propose a hybrid action reinforcement learning (RL) framework
to make decisions on semantic model scale, channel allocation, transmission
power, and UAV trajectory. The variables are classified into discrete type and
continuous type, which are optimized by two different RL agents to generate the
combined action. Simulation results indicate that the proposed hybrid action
reinforcement learning framework can effectively improve the efficiency of
uplink semantic data collection under different parameter settings and
outperforms the benchmark scenarios.
( 2
min )
Bug reports are an essential aspect of software development, and it is
crucial to identify and resolve them quickly to ensure the consistent
functioning of software systems. Retrieving similar bug reports from an
existing database can help reduce the time and effort required to resolve bugs.
In this paper, we compared the effectiveness of semantic textual similarity
methods for retrieving similar bug reports based on a similarity score. We
explored several embedding models such as TF-IDF (Baseline), FastText, Gensim,
BERT, and ADA. We used the Software Defects Data containing bug reports for
various software projects to evaluate the performance of these models. Our
experimental results showed that BERT generally outperformed the rest of the
models regarding recall, followed by ADA, Gensim, FastText, and TFIDF. Our
study provides insights into the effectiveness of different embedding methods
for retrieving similar bug reports and highlights the impact of selecting the
appropriate one for this task. Our code is available on GitHub.
( 2
min )
Extracting the rules of real-world multi-agent behaviors is a current
challenge in various scientific and engineering fields. Biological agents
independently have limited observation and mechanical constraints; however,
most of the conventional data-driven models ignore such assumptions, resulting
in lack of biological plausibility and model interpretability for behavioral
analyses. Here we propose sequential generative models with partial observation
and mechanical constraints in a decentralized manner, which can model agents'
cognition and body dynamics, and predict biologically plausible behaviors. We
formulate this as a decentralized multi-agent imitation-learning problem,
leveraging binary partial observation and decentralized policy models based on
hierarchical variational recurrent neural networks with physical and
biomechanical penalties. Using real-world basketball and soccer datasets, we
show the effectiveness of our method in terms of the constraint violations,
long-term trajectory prediction, and partial observation. Our approach can be
used as a multi-agent simulator to generate realistic trajectories using
real-world data.
( 2
min )
The Shapley value is widely regarded as a trustworthy attribution metric.
However, when people use Shapley values to explain the attribution of input
variables of a deep neural network (DNN), it usually requires a very high
computational cost to approximate relatively accurate Shapley values in
real-world applications. Therefore, we propose a novel network architecture,
the HarsanyiNet, which makes inferences on the input sample and simultaneously
computes the exact Shapley values of the input variables in a single forward
propagation. The HarsanyiNet is designed on the theoretical foundation that the
Shapley value can be reformulated as the redistribution of Harsanyi
interactions encoded by the network.
( 2
min )
Learning disentangled causal representations is a challenging problem that
has gained significant attention recently due to its implications for
extracting meaningful information for downstream tasks. In this work, we define
a new notion of causal disentanglement from the perspective of independent
causal mechanisms. We propose ICM-VAE, a framework for learning causally
disentangled representations supervised by causally related observed labels. We
model causal mechanisms using learnable flow-based diffeomorphic functions to
map noise variables to latent causal variables. Further, to promote the
disentanglement of causal factors, we propose a causal disentanglement prior
that utilizes the known causal structure to encourage learning a causally
factorized distribution in the latent space. Under relatively mild conditions,
we provide theoretical results showing the identifiability of causal factors
and mechanisms up to permutation and elementwise reparameterization. We
empirically demonstrate that our framework induces highly disentangled causal
factors, improves interventional robustness, and is compatible with
counterfactual generation.
( 2
min )
Empirical studies have widely demonstrated that neural networks are highly
sensitive to small, adversarial perturbations of the input. The worst-case
robustness against these so-called adversarial examples can be quantified by
the Lipschitz constant of the neural network. In this paper, we study upper and
lower bounds for the Lipschitz constant of random ReLU neural networks.
Specifically, we assume that the weights and biases follow a generalization of
the He initialization, where general symmetric distributions for the biases are
permitted. For shallow neural networks, we characterize the Lipschitz constant
up to an absolute numerical constant. For deep networks with fixed depth and
sufficiently large width, our established bounds differ by a factor that is
logarithmic in the width.
( 2
min )
In this paper, we put forth a novel framework (named ``RYU'') for the
construction of ``safe'' balls, i.e. regions that provably contain the dual
solution of a target optimization problem. We concentrate on the standard setup
where the cost function is the sum of two terms: a closed, proper, convex
Lipschitz-smooth function and a closed, proper, convex function. The RYU
framework is shown to generalize or improve upon all the results proposed in
the last decade for the considered family of optimization problems.
( 2
min )
Graph contrastive learning has shown great promise when labeled data is
scarce, but large unlabeled datasets are available. However, it often does not
take uncertainty estimation into account. We show that a variational Bayesian
neural network approach can be used to improve not only the uncertainty
estimates but also the downstream performance on semi-supervised
node-classification tasks. Moreover, we propose a new measure of uncertainty
for contrastive learning, that is based on the disagreement in likelihood due
to different positive samples.
( 2
min )
We present an efficient parameter-free approach for statistical learning from
corrupted training sets. We identify corrupted and non-corrupted samples using
latent Bernoulli variables, and therefore formulate the robust learning problem
as maximization of the likelihood where latent variables are marginalized out.
The resulting optimization problem is solved via variational inference using an
efficient Expectation-Maximization based method. The proposed approach improves
over the state-of-the-art by automatically inferring the corruption level and
identifying outliers, while adding minimal computational overhead. We
demonstrate our robust learning method on a wide variety of machine learning
tasks including online learning and deep learning where it exhibits ability to
adapt to different levels of noise and attain high prediction accuracy.
( 2
min )
Canonical Correlation Analysis (CCA) has been widely applied to jointly embed
multiple views of data in a maximally correlated latent space. However, the
alignment between various data perspectives, which is required by traditional
approaches, is unclear in many practical cases. In this work we propose a new
framework Aligned Canonical Correlation Analysis (ACCA), to address this
challenge by iteratively solving the alignment and multi-view embedding.
( 2
min )
This paper elucidates the challenges and opportunities inherent in
integrating data-driven methodologies into geotechnics, drawing inspiration
from the success of materials informatics. Highlighting the intricacies of soil
complexity, heterogeneity, and the lack of comprehensive data, the discussion
underscores the pressing need for community-driven database initiatives and
open science movements. By leveraging the transformative power of deep
learning, particularly in feature extraction from high-dimensional data and the
potential of transfer learning, we envision a paradigm shift towards a more
collaborative and innovative geotechnics field. The paper concludes with a
forward-looking stance, emphasizing the revolutionary potential brought about
by advanced computational tools like large language models in reshaping
geotechnics informatics.
( 2
min )
This paper aims to define, quantify, and analyze the feature complexity that
is learned by a DNN. We propose a generic definition for the feature
complexity. Given the feature of a certain layer in the DNN, our method
disentangles feature components of different complexity orders from the
feature. We further design a set of metrics to evaluate the reliability, the
effectiveness, and the significance of over-fitting of these feature
components. Furthermore, we successfully discover a close relationship between
the feature complexity and the performance of DNNs. As a generic mathematical
tool, the feature complexity and the proposed metrics can also be used to
analyze the success of network compression and knowledge distillation.
( 2
min )
Extracting the rules of real-world multi-agent behaviors is a current
challenge in various scientific and engineering fields. Biological agents
independently have limited observation and mechanical constraints; however,
most of the conventional data-driven models ignore such assumptions, resulting
in lack of biological plausibility and model interpretability for behavioral
analyses. Here we propose sequential generative models with partial observation
and mechanical constraints in a decentralized manner, which can model agents'
cognition and body dynamics, and predict biologically plausible behaviors. We
formulate this as a decentralized multi-agent imitation-learning problem,
leveraging binary partial observation and decentralized policy models based on
hierarchical variational recurrent neural networks with physical and
biomechanical penalties. Using real-world basketball and soccer datasets, we
show the effectiveness of our method in terms of the constraint violations,
long-term trajectory prediction, and partial observation. Our approach can be
used as a multi-agent simulator to generate realistic trajectories using
real-world data.
( 2
min )
To enhance the gaming experience, studios and developers spend tremendous effort creating photorealistic, immersive in-game environments. But non-playable characters (NPCs) often get left behind. Many behave in ways that lack depth and realism, making their interactions repetitive and forgettable. Inworld AI is changing the game by using generative AI to drive NPC behaviors that are Read article >
( 6
min )
This is a guest post co-authored by Nafi Ahmet Turgut, Hasan Burak Yel, and Damla Şentürk from Getir. Established in 2015, Getir has positioned itself as the trailblazer in the sphere of ultrafast grocery delivery. This innovative tech company has revolutionized the last-mile delivery segment with its compelling offering of “groceries in minutes.” With a […]
( 7
min )
The recent upheavals at OpenAI and OpenAI’s Chief Scientist’s apprehensions regarding the “safety” of AI have ignited a fresh wave of concerns and fears about the march towards Artificial General Intelligence (AGI) and “Super Intelligence.” AI safety concerns the development of AI systems aligned with human values and do not cause harm to humans. Some… Read More »A Different AI Scenario: AI and Justice in a Brave New World – Part 1
The post A Different AI Scenario: AI and Justice in a Brave New World – Part 1 appeared first on Data Science Central.
( 22
min )
Climate hazards can cause major disasters when they occur simultaneously as
compound hazards. To understand the distribution of climate risk and inform
adaptation policies, scientists need to simulate a large number of physically
realistic and spatially coherent events. Current methods are limited by
computational constraints and the probabilistic spatial distribution of
compound events is not given sufficient attention. The bottleneck in current
approaches lies in modelling the dependence structure between variables, as
inference on parametric models suffers from the curse of dimensionality.
Generative adversarial networks (GANs) are well-suited to such a problem due to
their ability to implicitly learn the distribution of data in high-dimensional
settings. We employ a GAN to model the dependence structure for daily maximum
wind speed, significant wave height, and total precipitation over the Bay of
Bengal, combining this with traditional extreme value theory for controlled
extrapolation of the tails. Once trained, the model can be used to efficiently
generate thousands of realistic compound hazard events, which can inform
climate risk assessments for climate adaptation and disaster preparedness. The
method developed is flexible and transferable to other multivariate and spatial
climate datasets.
( 2
min )
Inference of community structure in probabilistic graphical models may not be
consistent with fairness constraints when nodes have demographic attributes.
Certain demographics may be over-represented in some detected communities and
under-represented in others. This paper defines a novel $\ell_1$-regularized
pseudo-likelihood approach for fair graphical model selection. In particular,
we assume there is some community or clustering structure in the true
underlying graph, and we seek to learn a sparse undirected graph and its
communities from the data such that demographic groups are fairly represented
within the communities. In the case when the graph is known a priori, we
provide a convex semidefinite programming approach for fair community
detection. We establish the statistical consistency of the proposed method for
both a Gaussian graphical model and an Ising model for, respectively,
continuous and binary data, proving that our method can recover the graphs and
their fair communities with high probability.
( 2
min )
Analyzing large-scale time-series network data, such as social media and
email communications, poses a significant challenge in understanding social
dynamics, detecting anomalies, and predicting trends. In particular, the
scalability of graph analysis is a critical hurdle impeding progress in
large-scale downstream inference. To address this challenge, we introduce a
temporal encoder embedding method. This approach leverages ground-truth or
estimated vertex labels, enabling an efficient embedding of large-scale graph
data and the processing of billions of edges within minutes. Furthermore, this
embedding unveils a temporal dynamic statistic capable of detecting
communication pattern shifts across all levels, ranging from individual
vertices to vertex communities and the overall graph structure. We provide
theoretical support to confirm its soundness under random graph models, and
demonstrate its numerical advantages in capturing evolving communities and
identifying outliers. Finally, we showcase the practical application of our
approach by analyzing an anonymized time-series communication network from a
large organization spanning 2019-2020, enabling us to assess the impact of
Covid-19 on workplace communication patterns.
( 3
min )
This paper studies the one-shot behavior of no-regret algorithms for
stochastic bandits. Although many algorithms are known to be asymptotically
optimal with respect to the expected regret, over a single run, their
pseudo-regret seems to follow one of two tendencies: it is either smooth or
bumpy. To measure this tendency, we introduce a new notion: the sliding regret,
that measures the worst pseudo-regret over a time-window of fixed length
sliding to infinity. We show that randomized methods (e.g. Thompson Sampling
and MED) have optimal sliding regret, while index policies, although possibly
asymptotically optimal for the expected regret, have the worst possible sliding
regret under regularity conditions on their index (e.g. UCB, UCB-V, KL-UCB,
MOSS, IMED etc.). We further analyze the average bumpiness of the pseudo-regret
of index policies via the regret of exploration, that we show to be suboptimal
as well.
( 2
min )
Lipschitz continuity is a crucial functional property of any predictive
model, that naturally governs its robustness, generalisation, as well as
adversarial vulnerability. Contrary to other works that focus on obtaining
tighter bounds and developing different practical strategies to enforce certain
Lipschitz properties, we aim to thoroughly examine and characterise the
Lipschitz behaviour of Neural Networks. Thus, we carry out an empirical
investigation in a range of different settings (namely, architectures,
datasets, label noise, and more) by exhausting the limits of the simplest and
the most general lower and upper bounds. As a highlight of this investigation,
we showcase a remarkable fidelity of the lower Lipschitz bound, identify a
striking Double Descent trend in both upper and lower bounds to the Lipschitz
and explain the intriguing effects of label noise on function smoothness and
generalisation.
( 2
min )
The Fermat distance has been recently established as a useful tool for
machine learning tasks when a natural distance is not directly available to the
practitioner or to improve the results given by Euclidean distances by
exploding the geometrical and statistical properties of the dataset. This
distance depends on a parameter $\alpha$ that greatly impacts the performance
of subsequent tasks. Ideally, the value of $\alpha$ should be large enough to
navigate the geometric intricacies inherent to the problem. At the same, it
should remain restrained enough to sidestep any deleterious ramifications
stemming from noise during the process of distance estimation. We study both
theoretically and through simulations how to select this parameter.
( 2
min )
Virtually all machine learning tasks are characterized using some form of
loss function, and "good performance" is typically stated in terms of a
sufficiently small average loss, taken over the random draw of test data. While
optimizing for performance on average is intuitive, convenient to analyze in
theory, and easy to implement in practice, such a choice brings about
trade-offs. In this work, we survey and introduce a wide variety of
non-traditional criteria used to design and evaluate machine learning
algorithms, place the classical paradigm within the proper historical context,
and propose a view of learning problems which emphasizes the question of "what
makes for a desirable loss distribution?" in place of tacit use of the expected
loss.
( 2
min )
This paper presents a comprehensive comparative analysis of the performance
of Equivariant Quantum Neural Networks (EQNN) and Quantum Neural Networks
(QNN), juxtaposed against their classical counterparts: Equivariant Neural
Networks (ENN) and Deep Neural Networks (DNN). We evaluate the performance of
each network with two toy examples for a binary classification task, focusing
on model complexity (measured by the number of parameters) and the size of the
training data set. Our results show that the $\mathbb{Z}_2\times \mathbb{Z}_2$
EQNN and the QNN provide superior performance for smaller parameter sets and
modest training data samples.
( 2
min )
The ability to quickly build and deploy machine learning (ML) models is becoming increasingly important in today’s data-driven world. However, building ML models requires significant time, effort, and specialized expertise. From data collection and cleaning to feature engineering, model building, tuning, and deployment, ML projects often take months for developers to complete. And experienced data […]
( 10
min )
Launched in 2019, Amazon SageMaker Studio provides one place for all end-to-end machine learning (ML) workflows, from data preparation, building and experimentation, training, hosting, and monitoring. As we continue to innovate to increase data science productivity, we’re excited to announce the improved SageMaker Studio experience, which allows users to select the managed Integrated Development Environment (IDE) […]
( 6
min )
As organizations scale the adoption of machine learning (ML), they are looking for efficient and reliable ways to deploy new infrastructure and onboard teams to ML environments. One of the challenges is setting up authentication and fine-grained permissions for users based on their roles and activities. For example, MLOps engineers typically perform model deployment activities, […]
( 8
min )
PwR uses domain-specific languages to bridge communication between developers and AI tools. Learn how it can help simplify code creation and enhance software reliability and customization, no matter your coding expertise.
The post PwR: Using representations for AI-powered software development appeared first on Microsoft Research.
( 10
min )
Concept erasure in text-to-image diffusion models aims to disable pre-trained
diffusion models from generating images related to a target concept. To perform
reliable concept erasure, the properties of robustness and locality are
desirable. The former refrains the model from producing images associated with
the target concept for any paraphrased or learned prompts, while the latter
preserves the model ability in generating images for non-target concepts. In
this paper, we propose Reliable Concept Erasing via Lightweight Erasers
(Receler), which learns a lightweight Eraser to perform concept erasing and
enhances locality and robustness with the proposed concept-localized
regularization and adversarial prompt learning, respectively. Comprehensive
quantitative and qualitative experiments with various concept prompts verify
the superiority of Receler over the previous erasing methods on the above two
desirable properties.
( 2
min )
Multivariate time series have many applications, from healthcare and
meteorology to life science. Although deep learning models have shown excellent
predictive performance for time series, they have been criticised for being
"black-boxes" or non-interpretable. This paper proposes a novel modular neural
network model for multivariate time series prediction that is interpretable by
construction. A recurrent neural network learns the temporal dependencies in
the data while an attention-based feature selection component selects the most
relevant features and suppresses redundant features used in the learning of the
temporal dependencies. A modular deep network is trained from the selected
features independently to show the users how features influence outcomes,
making the model interpretable. Experimental results show that this approach
can outperform state-of-the-art interpretable Neural Additive Models (NAM) and
variations thereof in both regression and classification of time series tasks,
achieving a predictive performance that is comparable to the top
non-interpretable methods for time series, LSTM and XGBoost.
( 2
min )
Understanding whether a property is priced fairly hinders buyers and sellers
since they usually do not have an objective viewpoint of the price distribution
for the overall market of their interest. Drawing from data collected of all
possible available properties for rent in Manhattan as of September 2023, this
paper aims to strengthen our understanding of model residuals; specifically on
machine learning models which generalize for a majority of the distribution of
a well-proportioned dataset. Most models generally perceive deviations from
predicted values as mere inaccuracies, however this paper proposes a different
vantage point: when generalizing to at least 75\% of the data-set, the
remaining deviations reveal significant insights. To harness these insights, we
introduce the Price Anomaly Score (PAS), a metric capable of capturing
boundaries between irregularly predicted prices. By combining relative pricing
discrepancies with statistical significance, the Price Anomaly Score (PAS)
offers a multifaceted view of rental valuations. This metric allows experts to
identify overpriced or underpriced properties within a dataset by aggregating
PAS values, then fine-tuning upper and lower boundaries to any threshold to set
indicators of choice.
( 3
min )
Traditional multi-view stereo (MVS) methods rely heavily on photometric and
geometric consistency constraints, but newer machine learning-based MVS methods
check geometric consistency across multiple source views only as a
post-processing step. In this paper, we present a novel approach that
explicitly encourages geometric consistency of reference view depth maps across
multiple source views at different scales during learning (see Fig. 1). We find
that adding this geometric consistency loss significantly accelerates learning
by explicitly penalizing geometrically inconsistent pixels, reducing the
training iteration requirements to nearly half that of other MVS methods. Our
extensive experiments show that our approach achieves a new state-of-the-art on
the DTU and BlendedMVS datasets, and competitive results on the Tanks and
Temples benchmark. To the best of our knowledge, GC-MVSNet is the first attempt
to enforce multi-view, multi-scale geometric consistency during learning.
( 2
min )
Text-To-Image (TTI) models, such as DALL-E and StableDiffusion, have
demonstrated remarkable prompt-based image generation capabilities.
Multilingual encoders may have a substantial impact on the cultural agency of
these models, as language is a conduit of culture. In this study, we explore
the cultural perception embedded in TTI models by characterizing culture across
three hierarchical tiers: cultural dimensions, cultural domains, and cultural
concepts. Based on this ontology, we derive prompt templates to unlock the
cultural knowledge in TTI models, and propose a comprehensive suite of
evaluation techniques, including intrinsic evaluations using the CLIP space,
extrinsic evaluations with a Visual-Question-Answer (VQA) model and human
assessments, to evaluate the cultural content of TTI-generated images. To
bolster our research, we introduce the CulText2I dataset, derived from four
diverse TTI models and spanning ten languages. Our experiments provide insights
regarding Do, What, Which and How research questions about the nature of
cultural encoding in TTI models, paving the way for cross-cultural applications
of these models.
( 2
min )
Hyperparameter Optimization (HPO) of Deep Learning-based models tends to be a
compute resource intensive process as it usually requires to train the target
model with many different hyperparameter configurations. We show that
integrating model performance prediction with early stopping methods holds
great potential to speed up the HPO process of deep learning models. Moreover,
we propose a novel algorithm called Swift-Hyperband that can use either
classical or quantum support vector regression for performance prediction and
benefit from distributed High Performance Computing environments. This
algorithm is tested not only for the Machine-Learned Particle Flow model used
in High Energy Physics, but also for a wider range of target models from
domains such as computer vision and natural language processing.
Swift-Hyperband is shown to find comparable (or better) hyperparameters as well
as using less computational resources in all test cases.
( 2
min )
Tensor network (TN) representation is a powerful technique for computer
vision and machine learning. TN structure search (TN-SS) aims to search for a
customized structure to achieve a compact representation, which is a
challenging NP-hard problem. Recent "sampling-evaluation-based" methods require
sampling an extensive collection of structures and evaluating them one by one,
resulting in prohibitively high computational costs. To address this issue, we
propose a novel TN paradigm, named SVD-inspired TN decomposition (SVDinsTN),
which allows us to efficiently solve the TN-SS problem from a regularized
modeling perspective, eliminating the repeated structure evaluations. To be
specific, by inserting a diagonal factor for each edge of the fully-connected
TN, SVDinsTN allows us to calculate TN cores and diagonal factors
simultaneously, with the factor sparsity revealing a compact TN structure. In
theory, we prove a convergence guarantee for the proposed method. Experimental
results demonstrate that the proposed method achieves approximately 100 to 1000
times acceleration compared to the state-of-the-art TN-SS methods while
maintaining a comparable representation ability.
( 2
min )
The unstructured nature of data used in foundation model development is a
challenge to systematic analyses for making data use and documentation
decisions. From a Responsible AI perspective, these decisions often rely upon
understanding how people are represented in data. We propose a framework
designed to guide analysis of human representation in unstructured data and
identify downstream risks. We apply the framework in two toy examples using the
Common Crawl web text corpus (C4) and LAION-400M. We also propose a set of
hypothetical action steps in service of dataset use, development, and
documentation.
( 2
min )
Crop management decision support systems are specialized tools for farmers
that reduce the riskiness of revenue streams, especially valuable for use under
the current climate changes that impact agricultural productivity.
Unfortunately, small farmers in India, who could greatly benefit from these
tools, do not have access to them. In this paper, we model an individual
greenhouse as a Markov Decision Process (MDP) and adapt Li and Li (2019)'s
Follow the Weighted Leader (FWL) online learning algorithm to offer crop
planning advice. We successfully produce utility-preserving cropping pattern
suggestions in simulations. When we compare against an offline planning
algorithm, we achieve the same cumulative revenue with greatly reduced runtime.
( 2
min )
Generative models can produce impressively realistic images. This paper
demonstrates that generated images have geometric features different from those
of real images. We build a set of collections of generated images, prequalified
to fool simple, signal-based classifiers into believing they are real. We then
show that prequalified generated images can be identified reliably by
classifiers that only look at geometric properties. We use three such
classifiers. All three classifiers are denied access to image pixels, and look
only at derived geometric features. The first classifier looks at the
perspective field of the image, the second looks at lines detected in the
image, and the third looks at relations between detected objects and shadows.
Our procedure detects generated images more reliably than SOTA local signal
based detectors, for images from a number of distinct generators. Saliency maps
suggest that the classifiers can identify geometric problems reliably. We
conclude that current generators cannot reliably reproduce geometric properties
of real images.
( 2
min )
Model-agnostic anomaly detection is one of the promising approaches in the
search for new beyond the standard model physics. In this paper, we present
Set-VAE, a particle-based variational autoencoder (VAE) anomaly detection
algorithm. We demonstrate a 2x signal efficiency gain compared with traditional
subjettiness-based jet selection. Furthermore, with an eye to the future
deployment to trigger systems, we propose the CLIP-VAE, which reduces the
inference-time cost of anomaly detection by using the KL-divergence loss as the
anomaly score, resulting in a 2x acceleration in latency and reducing the
caching requirement.
( 2
min )
Evaluating the accuracy of outputs generated by Large Language Models (LLMs)
is especially important in the climate science and policy domain. We introduce
the Expert Confidence in Climate Statements (ClimateX) dataset, a novel,
curated, expert-labeled dataset consisting of 8094 climate statements collected
from the latest Intergovernmental Panel on Climate Change (IPCC) reports,
labeled with their associated confidence levels. Using this dataset, we show
that recent LLMs can classify human expert confidence in climate-related
statements, especially in a few-shot learning setting, but with limited (up to
47%) accuracy. Overall, models exhibit consistent and significant
over-confidence on low and medium confidence statements. We highlight
implications of our results for climate communication, LLMs evaluation
strategies, and the use of LLMs in information retrieval systems.
( 2
min )
Although much work has been done on explainability in the computer vision and
natural language processing (NLP) fields, there is still much work to be done
to explain methods applied to time series as time series by nature can not be
understood at first sight. In this paper, we present a Deep Neural Network
(DNN) in a teacher-student architecture (distillation model) that offers
interpretability in time-series classification tasks. The explainability of our
approach is based on transforming the time series to 2D plots and applying
image highlight methods (such as LIME and GradCam), making the predictions
interpretable. At the same time, the proposed approach offers increased
accuracy competing with the baseline model with the trade-off of increasing the
training time.
( 2
min )
Astronomical transients, such as supernovae and other rare stellar
explosions, have been instrumental in some of the most significant discoveries
in astronomy. New astronomical sky surveys will soon record unprecedented
numbers of transients as sparsely and irregularly sampled multivariate time
series. To improve our understanding of the physical mechanisms of transients
and their progenitor systems, early-time measurements are necessary.
Prioritizing the follow-up of transients based on their age along with their
class is crucial for new surveys. To meet this demand, we present the first
method of predicting the age of transients in real-time from multi-wavelength
time-series observations. We build a Bayesian probabilistic recurrent neural
network. Our method can accurately predict the age of a transient with robust
uncertainties as soon as it is initially triggered by a survey telescope. This
work will be essential for the advancement of our understanding of the numerous
young transients being detected by ongoing and upcoming astronomical surveys.
( 2
min )
We introduce a diffusion-based generative model to describe the distribution
of galaxies in our Universe directly as a collection of points in 3-D space
(coordinates) optionally with associated attributes (e.g., velocities and
masses), without resorting to binning or voxelization. The custom diffusion
model can be used both for emulation, reproducing essential summary statistics
of the galaxy distribution, as well as inference, by computing the conditional
likelihood of a galaxy field. We demonstrate a first application to massive
dark matter haloes in the Quijote simulation suite. This approach can be
extended to enable a comprehensive analysis of cosmological data, circumventing
limitations inherent to summary statistic -- as well as neural simulation-based
inference methods.
( 2
min )
The aim of this short note is to show that Denoising Diffusion Probabilistic
Model DDPM, a non-homogeneous discrete-time Markov process, can be represented
by a time-homogeneous continuous-time Markov process observed at non-uniformly
sampled discrete times. Surprisingly, this continuous-time Markov process is
the well-known and well-studied Ornstein-Ohlenbeck (OU) process, which was
developed in 1930's for studying Brownian particles in Harmonic potentials. We
establish the formal equivalence between DDPM and the OU process using its
analytical solution. We further demonstrate that the design problem of the
noise scheduler for non-homogeneous DDPM is equivalent to designing observation
times for the OU process. We present several heuristic designs for observation
times based on principled quantities such as auto-variance and Fisher
Information and connect them to ad hoc noise schedules for DDPM. Interestingly,
we show that the Fisher-Information-motivated schedule corresponds exactly the
cosine schedule, which was developed without any theoretical foundation but is
the current state-of-the-art noise schedule.
( 2
min )
Diffusion models excel at generating photo-realistic images but come with
significant computational costs in both training and sampling. While various
techniques address these computational challenges, a less-explored issue is
designing an efficient and adaptable network backbone for iterative refinement.
Current options like U-Net and Vision Transformer often rely on
resource-intensive deep networks and lack the flexibility needed for generating
images at variable resolutions or with a smaller network than used in training.
This study introduces LEGO bricks, which seamlessly integrate Local-feature
Enrichment and Global-content Orchestration. These bricks can be stacked to
create a test-time reconfigurable diffusion backbone, allowing selective
skipping of bricks to reduce sampling costs and generate higher-resolution
images than the training data. LEGO bricks enrich local regions with an MLP and
transform them using a Transformer block while maintaining a consistent
full-resolution image across all bricks. Experimental results demonstrate that
LEGO bricks enhance training efficiency, expedite convergence, and facilitate
variable-resolution image generation while maintaining strong generative
performance. Moreover, LEGO significantly reduces sampling time compared to
other methods, establishing it as a valuable enhancement for diffusion models.
( 2
min )
Causal inference studies whether the presence of a variable influences an
observed outcome. As measured by quantities such as the "average treatment
effect," this paradigm is employed across numerous biological fields, from
vaccine and drug development to policy interventions. Unfortunately, the
majority of these methods are often limited to univariate outcomes. Our work
generalizes causal estimands to outcomes with any number of dimensions or any
measurable space, and formulates traditional causal estimands for nominal
variables as causal discrepancy tests. We propose a simple technique for
adjusting universally consistent conditional independence tests and prove that
these tests are universally consistent causal discrepancy tests. Numerical
experiments illustrate that our method, Causal CDcorr, leads to improvements in
both finite sample validity and power when compared to existing strategies. Our
methods are all open source and available at github.com/ebridge2/cdcorr.
( 2
min )
Astronomical transients, such as supernovae and other rare stellar
explosions, have been instrumental in some of the most significant discoveries
in astronomy. New astronomical sky surveys will soon record unprecedented
numbers of transients as sparsely and irregularly sampled multivariate time
series. To improve our understanding of the physical mechanisms of transients
and their progenitor systems, early-time measurements are necessary.
Prioritizing the follow-up of transients based on their age along with their
class is crucial for new surveys. To meet this demand, we present the first
method of predicting the age of transients in real-time from multi-wavelength
time-series observations. We build a Bayesian probabilistic recurrent neural
network. Our method can accurately predict the age of a transient with robust
uncertainties as soon as it is initially triggered by a survey telescope. This
work will be essential for the advancement of our understanding of the numerous
young transients being detected by ongoing and upcoming astronomical surveys.
( 2
min )
There are a number of available methods for selecting whom to prioritize for
treatment, including ones based on treatment effect estimation, risk scoring,
and hand-crafted rules. We propose rank-weighted average treatment effect
(RATE) metrics as a simple and general family of metrics for comparing and
testing the quality of treatment prioritization rules. RATE metrics are
agnostic as to how the prioritization rules were derived, and only assess how
well they identify individuals that benefit the most from treatment. We define
a family of RATE estimators and prove a central limit theorem that enables
asymptotically exact inference in a wide variety of randomized and
observational study settings. RATE metrics subsume a number of existing
metrics, including the Qini coefficient, and our analysis directly yields
inference methods for these metrics. We showcase RATE in the context of a
number of applications, including optimal targeting of aspirin to stroke
patients.
( 2
min )
We introduce a diffusion-based generative model to describe the distribution
of galaxies in our Universe directly as a collection of points in 3-D space
(coordinates) optionally with associated attributes (e.g., velocities and
masses), without resorting to binning or voxelization. The custom diffusion
model can be used both for emulation, reproducing essential summary statistics
of the galaxy distribution, as well as inference, by computing the conditional
likelihood of a galaxy field. We demonstrate a first application to massive
dark matter haloes in the Quijote simulation suite. This approach can be
extended to enable a comprehensive analysis of cosmological data, circumventing
limitations inherent to summary statistic -- as well as neural simulation-based
inference methods.
( 2
min )
Synthetic data (SD) have garnered attention as a privacy enhancing
technology. Unfortunately, there is no standard for quantifying their degree of
privacy protection. In this paper, we discuss proposed quantification
approaches. This contributes to the development of SD privacy standards;
stimulates multi-disciplinary discussion; and helps SD researchers make
informed modeling and evaluation decisions.
( 2
min )
We believe generative AI has the potential over time to transform virtually every customer experience we know. The number of companies launching generative AI applications on AWS is substantial and building quickly, including adidas, Booking.com, Bridgewater Associates, Clariant, Cox Automotive, GoDaddy, and LexisNexis Legal & Professional, to name just a few. Innovative startups like Perplexity […]
( 26
min )
Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning (ML) models at scale. SageMaker makes it easy to deploy models into production directly through API calls to the service. Models are packaged into containers for robust and scalable deployments. SageMaker provides […]
( 12
min )
Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and effortlessly build, train, and deploy machine learning (ML) models at any scale. SageMaker makes it straightforward to deploy models into production directly through API calls to the service. Models are packaged into containers for robust and scalable deployments. Although […]
( 15
min )
Today, we are excited to announce support for Code Editor, a new integrated development environment (IDE) option in Amazon SageMaker Studio. Code Editor is based on Code-OSS, Visual Studio Code Open Source, and provides access to the familiar environment and tools of the popular IDE that machine learning (ML) developers know and love, fully integrated […]
( 9
min )
As democratization of foundation models (FMs) becomes more prevalent and demand for AI-augmented services increases, software as a service (SaaS) providers are looking to use machine learning (ML) platforms that support multiple tenants—for data scientists internal to their organization and external customers. More and more companies are realizing the value of using FMs to generate […]
( 17
min )
As organizations deploy models to production, they are constantly looking for ways to optimize the performance of their foundation models (FMs) running on the latest accelerators, such as AWS Inferentia and GPUs, so they can reduce their costs and decrease response latency to provide the best experience to end-users. However, some FMs don’t fully utilize […]
( 13
min )
Amazon SageMaker makes it straightforward to deploy machine learning (ML) models for real-time inference and offers a broad selection of ML instances spanning CPUs and accelerators such as AWS Inferentia. As a fully managed service, you can scale your model deployments, minimize inference costs, and manage your models more effectively in production with reduced operational […]
( 6
min )
Amazon SageMaker Canvas is a no-code workspace that enables analysts and citizen data scientists to generate accurate machine learning (ML) predictions for their business needs. Starting today, SageMaker Canvas supports advanced model build configurations such as selecting a training method (ensemble or hyperparameter optimization) and algorithms, customizing the training and validation data split ratio, and […]
( 12
min )
Building foundation models (FMs) requires building, maintaining, and optimizing large clusters to train models with tens to hundreds of billions of parameters on vast amounts of data. Creating a resilient environment that can handle failures and environmental changes without losing days or weeks of model training progress is an operational challenge that requires you to […]
( 10
min )
Digital publishers are continuously looking for ways to streamline and automate their media workflows to generate and publish new content as rapidly as they can, but without foregoing quality. Adding images to capture the essence of text can improve the reading experience. Machine learning techniques can help you discover such images. “A striking image is […]
( 10
min )
The risks associated with generative AI have been well-publicized. Toxicity, bias, escaped PII, and hallucinations negatively impact an organization’s reputation and damage customer trust. Research shows that not only do risks for bias and toxicity transfer from pre-trained foundation models (FM) to task-specific generative AI services, but that tuning an FM for specific tasks, on […]
( 13
min )
Data preparation is a crucial step in any machine learning (ML) workflow, yet it often involves tedious and time-consuming tasks. Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler. With this integration, SageMaker Canvas provides customers with an end-to-end no-code workspace to prepare data, build and use ML and […]
( 7
min )
In the last few years Large Language Models (LLMs) have risen to prominence as outstanding tools capable of understanding, generating and manipulating text with unprecedented proficiency. Their potential applications span from conversational agents to content generation and information retrieval, holding the promise of revolutionizing all industries. However, harnessing this potential while ensuring the responsible and […]
( 15
min )
In today’s rapidly evolving landscape of artificial intelligence, deep learning models have found themselves at the forefront of innovation, with applications spanning computer vision (CV), natural language processing (NLP), and recommendation systems. However, the increasing cost associated with training and fine-tuning these models poses a challenge for enterprises. This cost is primarily driven by the […]
( 8
min )
In November 2023, MarketsandMarkets announced the publication of its Knowledge Graph Market report. In its announcement, M&M estimated the 2023 global knowledge graph market at $0.9 billion, forecasting market growth to $2.4 billion by 2028, a compound annual growth rate of 21.9 percent. M&M also listed these 12 “key players” in its announcement: I haven’t… Read More »A few large enterprise software provider strategies for the knowledge graph market
The post A few large enterprise software provider strategies for the knowledge graph market appeared first on Data Science Central.
( 21
min )
AI Weirdness: the strange side of machine learning
( 2
min )
Predicting the infiltration of Glioblastoma (GBM) from medical MRI scans is
crucial for understanding tumor growth dynamics and designing personalized
radiotherapy treatment plans.Mathematical models of GBM growth can complement
the data in the prediction of spatial distributions of tumor cells. However,
this requires estimating patient-specific parameters of the model from clinical
data, which is a challenging inverse problem due to limited temporal data and
the limited time between imaging and diagnosis. This work proposes a method
that uses Physics-Informed Neural Networks (PINNs) to estimate patient-specific
parameters of a reaction-diffusion PDE model of GBM growth from a single 3D
structural MRI snapshot. PINNs embed both the data and the PDE into a loss
function, thus integrating theory and data. Key innovations include the
identification and estimation of characteristic non-dimensional parameters, a
pre-training step that utilizes the non-dimensional parameters and a
fine-tuning step to determine the patient specific parameters. Additionally,
the diffuse domain method is employed to handle the complex brain geometry
within the PINN framework. Our method is validated both on synthetic and
patient datasets, and shows promise for real-time parametric inference in the
clinical setting for personalized GBM treatment.
( 2
min )
Electroanatomical mapping is a technique used in cardiology to create a
detailed 3D map of the electrical activity in the heart. It is useful for
diagnosis, treatment planning and real time guidance in cardiac ablation
procedures to treat arrhythmias like atrial fibrillation. A probabilistic
machine learning model trained on a library of CT/MRI scans of the heart can be
used during electroanatomical mapping to generate a patient-specific 3D model
of the chamber being mapped. The use of probabilistic machine learning models
under a Bayesian framework provides a way to quantify uncertainty in results
and provide a natural framework of interpretability of the model. Here we
introduce a Bayesian approach to surface reconstruction of cardiac chamber
models from a sparse 3D point cloud data acquired during electroanatomical
mapping. We show how probabilistic graphical models trained on segmented CT/MRI
data can be used to generate cardiac chamber models from few acquired locations
thereby reducing procedure time and x-ray exposure. We show how they provide
insight into what the neural network learns from the segmented CT/MRI images
used to train the network, which provides explainability to the resulting
cardiac chamber models generated by the model.
( 2
min )
We study the sample complexity of identifying the pure strategy Nash
equilibrium (PSNE) in a two-player zero-sum matrix game with noise. Formally,
we are given a stochastic model where any learner can sample an entry $(i,j)$
of the input matrix $A\in[-1,1]^{n\times m}$ and observe $A_{i,j}+\eta$ where
$\eta$ is a zero-mean 1-sub-Gaussian noise. The aim of the learner is to
identify the PSNE of $A$, whenever it exists, with high probability while
taking as few samples as possible. Zhou et al. (2017) presents an
instance-dependent sample complexity lower bound that depends only on the
entries in the row and column in which the PSNE lies. We design a near-optimal
algorithm whose sample complexity matches the lower bound, up to log factors.
The problem of identifying the PSNE also generalizes the problem of pure
exploration in stochastic multi-armed bandits and dueling bandits, and our
result matches the optimal bounds, up to log factors, in both the settings.
( 2
min )
Large language models (LLMs) aligned to human preferences via reinforcement
learning from human feedback (RLHF) underpin many commercial applications of
LLM technology. Despite this, the impacts of RLHF on LLM internals remain
opaque. We propose a novel method for interpreting implicit reward models
(IRMs) in LLMs learned through RLHF. Our approach trains pairs of autoencoders
on activations from a base LLM and its RLHF-tuned variant. Through a comparison
of autoencoder hidden spaces, we identify features that reflect the accuracy of
the learned IRM. To illustrate our method, we fine-tune an LLM via RLHF to
learn a token-utility mapping and maximize the aggregate utility of generated
text. This is the first application of sparse autoencoders to interpreting
IRMs. Our method provides an abstract approximation of reward integrity and
holds promise for measuring alignment between specified objectives and learned
model behaviors.
( 2
min )
Many problems in machine learning can be formulated as solving
entropy-regularized optimal transport on the space of probability measures. The
canonical approach involves the Sinkhorn iterates, renowned for their rich
mathematical properties. Recently, the Sinkhorn algorithm has been recast
within the mirror descent framework, thus benefiting from classical
optimization theory insights. Here, we build upon this result by introducing a
continuous-time analogue of the Sinkhorn algorithm. This perspective allows us
to derive novel variants of Sinkhorn schemes that are robust to noise and bias.
Moreover, our continuous-time dynamics not only generalize but also offer a
unified perspective on several recently discovered dynamics in machine learning
and mathematics, such as the "Wasserstein mirror flow" of (Deb et al. 2023) or
the "mean-field Schr\"odinger equation" of (Claisse et al. 2023).
( 2
min )
In climate simulations, small-scale processes shape ocean dynamics but remain
computationally expensive to resolve directly. For this reason, their
contributions are commonly approximated using empirical parameterizations,
which lead to significant errors in long-term projections. In this work, we
develop parameterizations based on Fourier Neural Operators, showcasing their
accuracy and generalizability in comparison to other approaches. Finally, we
discuss the potential and limitations of neural networks operating in the
frequency domain, paving the way for future investigation.
( 2
min )
Missing data is a common problem in practical settings. Various imputation
methods have been developed to deal with missing data. However, even though the
label is usually available in the training data, the common practice of
imputation usually only relies on the input and ignores the label. In this
work, we illustrate how stacking the label into the input can significantly
improve the imputation of the input. In addition, we propose a classification
strategy that initializes the predicted test label with missing values and
stacks the label with the input for imputation. This allows imputing the label
and the input at the same time. Also, the technique is capable of handling data
training with missing labels without any prior imputation and is applicable to
continuous, categorical, or mixed-type data. Experiments show promising results
in terms of accuracy.
( 2
min )
Many problems in machine learning can be formulated as solving
entropy-regularized optimal transport on the space of probability measures. The
canonical approach involves the Sinkhorn iterates, renowned for their rich
mathematical properties. Recently, the Sinkhorn algorithm has been recast
within the mirror descent framework, thus benefiting from classical
optimization theory insights. Here, we build upon this result by introducing a
continuous-time analogue of the Sinkhorn algorithm. This perspective allows us
to derive novel variants of Sinkhorn schemes that are robust to noise and bias.
Moreover, our continuous-time dynamics not only generalize but also offer a
unified perspective on several recently discovered dynamics in machine learning
and mathematics, such as the "Wasserstein mirror flow" of (Deb et al. 2023) or
the "mean-field Schr\"odinger equation" of (Claisse et al. 2023).
( 2
min )
Rodney Brooks, co-founder of iRobot, kicks off an MIT symposium on the promise and potential pitfalls of increasingly powerful AI tools like ChatGPT.
( 12
min )
Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. Amazon SageMaker notebook jobs allow data scientists to run their notebooks on demand or on a schedule with a few clicks in SageMaker Studio. With this launch, you can programmatically run notebooks as jobs […]
( 11
min )
The rapid growth of generative AI brings promising new innovation, and at the same time raises new challenges. These challenges include some that were common before generative AI, such as bias and explainability, and new ones unique to foundation models (FMs), including hallucination and toxicity. At AWS, we are committed to developing generative AI responsibly, […]
( 9
min )
Since launching in June 2023, the AWS Generative AI Innovation Center team of strategists, data scientists, machine learning (ML) engineers, and solutions architects have worked with hundreds of customers worldwide, and helped them ideate, prioritize, and build bespoke solutions that harness the power of generative AI. Customers worked closely with us to prioritize use cases, […]
( 4
min )
Mira Murati as CTO, Greg Brockman returns as President. Read messages from CEO Sam Altman and board chair Bret Taylor.
( 5
min )
The magnitude of a metric space was recently established as a novel
invariant, providing a measure of the `effective size' of a space across
multiple scales. By capturing both geometrical and topological properties of
data, magnitude is poised to address challenges in unsupervised representation
learning tasks. We formalise a novel notion of dissimilarity between magnitude
functions of finite metric spaces and use them to derive a quality measure for
dimensionality reduction tasks. Our measure is provably stable under
perturbations of the data, can be efficiently calculated, and enables a
rigorous multi-scale comparison of embeddings. We show the utility of our
measure in an experimental suite that comprises different domains and tasks,
including the comparison of data visualisations.
( 2
min )
Motivated by applications in text mining and discrete distribution inference,
we investigate the testing for equality of probability mass functions of $K$
groups of high-dimensional multinomial distributions. A test statistic, which
is shown to have an asymptotic standard normal distribution under the null, is
proposed. The optimal detection boundary is established, and the proposed test
is shown to achieve this optimal detection boundary across the entire parameter
space of interest. The proposed method is demonstrated in simulation studies
and applied to analyze two real-world datasets to examine variation among
consumer reviews of Amazon movies and diversity of statistical paper abstracts.
( 2
min )
In the multi-armed bandit framework, there are two formulations that are
commonly employed to handle time-varying reward distributions: adversarial
bandit and nonstationary bandit. Although their oracles, algorithms, and regret
analysis differ significantly, we provide a unified formulation in this paper
that smoothly bridges the two as special cases. The formulation uses an oracle
that takes the best-fixed arm within time windows. Depending on the window
size, it turns into the oracle in hindsight in the adversarial bandit and
dynamic oracle in the nonstationary bandit. We provide algorithms that attain
the optimal regret with the matching lower bound.
( 2
min )
Although gradient descent with momentum is widely used in modern deep
learning, a concrete understanding of its effects on the training trajectory
still remains elusive. In this work, we empirically show that momentum gradient
descent with a large learning rate and learning rate warmup displays large
catapults, driving the iterates towards flatter minima than those found by
gradient descent. We then provide empirical evidence and theoretical intuition
that the large catapult is caused by momentum "amplifying" the
self-stabilization effect (Damian et al., 2023).
( 2
min )
The exploration-exploitation dilemma has been a central challenge in
reinforcement learning (RL) with complex model classes. In this paper, we
propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound
(MQL-UCB) for RL with general function approximation. Our key algorithmic
design includes (1) a general deterministic policy-switching strategy that
achieves low switching cost, (2) a monotonic value function structure with
carefully controlled function class complexity, and (3) a variance-weighted
regression scheme that exploits historical trajectories with high data
efficiency. MQL-UCB achieves minimax optimal regret of $\tilde{O}(d\sqrt{HK})$
when $K$ is sufficiently large and near-optimal policy switching cost of
$\tilde{O}(dH)$, with $d$ being the eluder dimension of the function class, $H$
being the planning horizon, and $K$ being the number of episodes.
Our work sheds light on designing provably sample-efficient and
deployment-efficient Q-learning with nonlinear function approximation.
( 2
min )
Constrained optimization of the parameters of a simulator plays a crucial
role in a design process. These problems become challenging when the simulator
is stochastic, computationally expensive, and the parameter space is
high-dimensional. One can efficiently perform optimization only by utilizing
the gradient with respect to the parameters, but these gradients are
unavailable in many legacy, black-box codes. We introduce the algorithm
Scout-Nd (Stochastic Constrained Optimization for N dimensions) to tackle the
issues mentioned earlier by efficiently estimating the gradient, reducing the
noise of the gradient estimator, and applying multi-fidelity schemes to further
reduce computational effort. We validate our approach on standard benchmarks,
demonstrating its effectiveness in optimizing parameters highlighting better
performance compared to existing methods.
( 2
min )
Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingly
popular in financial applications, owing to certain desirable properties that
it enjoys. We consider the problem of estimating UBSR in a recursive setting,
where samples from the underlying loss distribution are available
one-at-a-time. We cast the UBSR estimation problem as a root finding problem,
and propose stochastic approximation-based estimations schemes. We derive
non-asymptotic bounds on the estimation error in the number of samples. We also
consider the problem of UBSR optimization within a parameterized class of
random variables. We propose a stochastic gradient descent based algorithm for
UBSR optimization, and derive non-asymptotic bounds on its convergence.
( 2
min )
In a high-dimensional regression framework, we study consequences of the
naive two-step procedure where first the dimension of the input variables is
reduced and second, the reduced input variables are used to predict the output
variable with kernel regression. In order to analyze the resulting regression
errors, a novel stability result for kernel regression with respect to the
Wasserstein distance is derived. This allows us to bound errors that occur when
perturbed input data is used to fit the regression function. We apply the
general stability result to principal component analysis (PCA). Exploiting
known estimates from the literature on both principal component analysis and
kernel regression, we deduce convergence rates for the two-step procedure. The
latter turns out to be particularly useful in a semi-supervised setting.
( 2
min )
Density power divergence (DPD) is designed to robustly estimate the
underlying distribution of observations, in the presence of outliers. However,
DPD involves an integral of the power of the parametric density models to be
estimated; the explicit form of the integral term can be derived only for
specific densities, such as normal and exponential densities. While we may
perform a numerical integration for each iteration of the optimization
algorithms, the computational complexity has hindered the practical application
of DPD-based estimation to more general parametric densities. To address the
issue, this study introduces a stochastic approach to minimize DPD for general
parametric density models. The proposed approach also can be employed to
minimize other density power-based $\gamma$-divergences, by leveraging
unnormalized models.
( 2
min )
We study the long time behavior of an underdamped mean-field Langevin (MFL)
equation, and provide a general convergence as well as an exponential
convergence rate result under different conditions. The results on the MFL
equation can be applied to study the convergence of the Hamiltonian gradient
descent algorithm for the overparametrized optimization. We then provide a
numerical example of the algorithm to train a generative adversarial networks
(GAN).
( 2
min )
We consider the gradient descent flow widely used for the minimization of the
$\mathcal{L}^2$ cost function in Deep Learning networks, and introduce two
modified versions; one adapted for the overparametrized setting, and the other
for the underparametrized setting. Both have a clear and natural invariant
geometric meaning, taking into account the pullback vector bundle structure in
the overparametrized, and the pushforward vector bundle structure in the
underparametrized setting. In the overparametrized case, we prove that,
provided that a rank condition holds, all orbits of the modified gradient
descent drive the $\mathcal{L}^2$ cost to its global minimum at a uniform
exponential convergence rate. We point out relations of the latter to
sub-Riemannian geometry.
( 2
min )
The convergence of deterministic policy gradient under the Hadamard
parameterization is studied in the tabular setting and the linear convergence
of the algorithm is established. To this end, we first show that the error
decreases at an $O(\frac{1}{k})$ rate for all the iterations. Based on this
result, we further show that the algorithm has a faster local linear
convergence rate after $k_0$ iterations, where $k_0$ is a constant that only
depends on the MDP problem and the initialization. To show the local linear
convergence of the algorithm, we have indeed established the contraction of the
sub-optimal probability $b_s^k$ (i.e., the probability of the output policy
$\pi^k$ on non-optimal actions) when $k\ge k_0$.
( 2
min )
Navigating dynamic physical environments without obstructing or damaging
human assets is of quintessential importance for social robots. In this work,
we solve autonomous drone navigation's sub-problem of predicting out-of-domain
human and agent trajectories using a deep generative model. Our method:
General-PECNet or G-PECNet observes an improvement of 9.5\% on the Final
Displacement Error (FDE) on 2020's benchmark: PECNet through a combination of
architectural improvements inspired by periodic activation functions and
synthetic trajectory (data) augmentations using Hidden Markov Models (HMMs) and
Reinforcement Learning (RL). Additionally, we propose a simple
geometry-inspired metric for trajectory non-linearity and outlier detection,
helpful for the task. Code available at
$\href{https://github.com/Aryan-Garg/PECNet-Pedestrian-Trajectory-Prediction.git}{GitHub}$
( 2
min )
We study the long time behavior of an underdamped mean-field Langevin (MFL)
equation, and provide a general convergence as well as an exponential
convergence rate result under different conditions. The results on the MFL
equation can be applied to study the convergence of the Hamiltonian gradient
descent algorithm for the overparametrized optimization. We then provide a
numerical example of the algorithm to train a generative adversarial networks
(GAN).
( 2
min )
Federated learning is a new learning paradigm that decouples data collection
and model training via multi-party computation and model aggregation. As a
flexible learning setting, federated learning has the potential to integrate
with other learning frameworks. We conduct a focused survey of federated
learning in conjunction with other learning algorithms. Specifically, we
explore various learning algorithms to improve the vanilla federated averaging
algorithm and review model fusion methods such as adaptive aggregation,
regularization, clustered methods, and Bayesian methods. Following the emerging
trends, we also discuss federated learning in the intersection with other
learning paradigms, termed federated X learning, where X includes multitask
learning, meta-learning, transfer learning, unsupervised learning, and
reinforcement learning. This survey reviews the state of the art, challenges,
and future directions.
( 2
min )
As the adoption of Artificial Intelligence (AI) systems within the clinical
environment grows, limitations in bandwidth and compute can create
communication bottlenecks when streaming imaging data, leading to delays in
patient care and increased cost. As such, healthcare providers and AI vendors
will require greater computational infrastructure, therefore dramatically
increasing costs. To that end, we developed ISLE, an intelligent streaming
framework for high-throughput, compute- and bandwidth- optimized, and cost
effective AI inference for clinical decision making at scale. In our
experiments, ISLE on average reduced data transmission by 98.02% and decoding
time by 98.09%, while increasing throughput by 2,730%. We show that ISLE
results in faster turnaround times, and reduced overall cost of data,
transmission, and compute, without negatively impacting clinical decision
making using AI systems.
( 2
min )
The thrombotic microangiopathies (TMAs) manifest in renal biopsy histology
with a broad spectrum of acute and chronic findings. Precise diagnostic
criteria for a renal biopsy diagnosis of TMA are missing. As a first step
towards a machine learning- and computer vision-based analysis of wholes slide
images from renal biopsies, we trained a segmentation model for the decisive
diagnostic kidney tissue compartments artery, arteriole, glomerulus on a set of
whole slide images from renal biopsies with TMAs and Mimickers (distinct
diseases with a similar nephropathological appearance as TMA like severe benign
nephrosclerosis, various vasculitides, Bevacizumab-plug glomerulopathy,
arteriolar light chain deposition disease). Our segmentation model combines a
U-Net-based tissue detection with a Shifted windows-transformer architecture to
reach excellent segmentation results for even the most severely altered
glomeruli, arterioles and arteries, even on unseen staining domains from a
different nephropathology lab. With accurate automatic segmentation of the
decisive renal biopsy compartments in human renal vasculopathies, we have laid
the foundation for large-scale compartment-specific machine learning and
computer vision analysis of renal biopsy repositories with TMAs.
( 3
min )
Explainable Artificial Intelligence (XAI) is targeted at understanding how
models perform feature selection and derive their classification decisions.
This paper explores post-hoc explanations for deep neural networks in the audio
domain. Notably, we present a novel Open Source audio dataset consisting of
30,000 audio samples of English spoken digits which we use for classification
tasks on spoken digits and speakers' biological sex. We use the popular XAI
technique Layer-wise Relevance Propagation (LRP) to identify relevant features
for two neural network architectures that process either waveform or
spectrogram representations of the data. Based on the relevance scores obtained
from LRP, hypotheses about the neural networks' feature selection are derived
and subsequently tested through systematic manipulations of the input data.
Further, we take a step beyond visual explanations and introduce audible
heatmaps. We demonstrate the superior interpretability of audible explanations
over visual ones in a human user study.
( 2
min )
In the field of statistical physics, machine learning has gained significant
popularity and has achieved remarkable results in recent studies on phase
transitions.In this paper, we apply Principal Component Analysis (PCA) and
Autoencoder(AE) based on Unsupervised learning to study the various
configurations of the percolation model in equilibrium phase transition. In
certain phase transition models, such as the DP model in non-equilibrium phase
transitions, the order parameter is particle density. However, in some other
phase transition models, such as the percolation model, it is not. This study
involved randomizing and selecting percolation graphs to be used as input for a
neural network, and analyzed the obtained results, indicating that the outputs
of the single latent variable of AE and the first principal component of PCA
are signals related to particle density.
( 2
min )
We introduce a generalizable approach that combines perturbation method and
one-shot transfer learning to solve nonlinear ODEs with a single polynomial
term, using Physics-Informed Neural Networks (PINNs). Our method transforms
non-linear ODEs into linear ODE systems, trains a PINN across varied
conditions, and offers a closed-form solution for new instances within the same
non-linear ODE class. We demonstrate the effectiveness of this approach on the
Duffing equation and suggest its applicability to similarly structured PDEs and
ODE systems.
( 2
min )
In recent years, Large Language Models (LLM) have emerged as pivotal tools in
various applications. However, these models are susceptible to adversarial
prompt attacks, where attackers can carefully curate input strings that lead to
undesirable outputs. The inherent vulnerability of LLMs stems from their
input-output mechanisms, especially when presented with intensely
out-of-distribution (OOD) inputs. This paper proposes a token-level detection
method to identify adversarial prompts, leveraging the LLM's capability to
predict the next token's probability. We measure the degree of the model's
perplexity and incorporate neighboring token information to encourage the
detection of contiguous adversarial prompt sequences. As a result, we propose
two methods: one that identifies each token as either being part of an
adversarial prompt or not, and another that estimates the probability of each
token being part of an adversarial prompt.
( 2
min )
Zero-shot Dialogue State Tracking (DST) addresses the challenge of acquiring
and annotating task-oriented dialogues, which can be time-consuming and costly.
However, DST extends beyond simple slot-filling and requires effective updating
strategies for tracking dialogue state as conversations progress. In this
paper, we propose ParsingDST, a new In-Context Learning (ICL) method, to
introduce additional intricate updating strategies in zero-shot DST. Our
approach reformulates the DST task by leveraging powerful Large Language Models
(LLMs) and translating the original dialogue text to JSON through semantic
parsing as an intermediate state. We also design a novel framework that
includes more modules to ensure the effectiveness of updating strategies in the
text-to-JSON process. Experimental results demonstrate that our approach
outperforms existing zero-shot DST methods on MultiWOZ, exhibiting significant
improvements in Joint Goal Accuracy (JGA) and slot accuracy compared to
existing ICL methods. Our code has been released.
( 2
min )
To process sensor data in the Internet of Things(IoTs), embedded deep
learning for 1-dimensional data is an important technique. In the past, CNNs
were frequently used because they are simple to optimise for special embedded
hardware such as FPGAs. This work proposes a novel LSTM cell optimisation aimed
at energy-efficient inference on end devices. Using the traffic speed
prediction as a case study, a vanilla LSTM model with the optimised LSTM cell
achieves 17534 inferences per second while consuming only 3.8 $\mu$J per
inference on the FPGA XC7S15 from Spartan-7 family. It achieves at least
5.4$\times$ faster throughput and 1.37$\times$ more energy efficient than
existing approaches.
( 2
min )
The ability to construct a realistic simulator of financial exchanges,
including reproducing the dynamics of the limit order book, can give insight
into many counterfactual scenarios, such as a flash crash, a margin call, or
changes in macroeconomic outlook. In recent years, agent-based models have been
developed that reproduce many features of an exchange, as summarised by a set
of stylised facts and statistics. However, the ability to calibrate simulators
to a specific period of trading remains an open challenge. In this work, we
develop a novel approach to the calibration of market simulators by leveraging
recent advances in deep learning, specifically using neural density estimators
and embedding networks. We demonstrate that our approach is able to correctly
identify high probability parameter sets, both when applied to synthetic and
historical data, and without reliance on manually selected or weighted
ensembles of stylised facts.
( 2
min )
Normalizing flows (NF) recently gained attention as a way to construct
generative networks with exact likelihood calculation out of composable layers.
However, NF is restricted to dimension-preserving transformations. Surjection
VAE (SurVAE) has been proposed to extend NF to dimension-altering
transformations. Such networks are desirable because they are expressive and
can be precisely trained. We show that the approaches are a re-invention of PDF
projection, which appeared over twenty years earlier and is much further
developed.
( 2
min )
We present a new method that includes three key components of distributed
optimization and federated learning: variance reduction of stochastic
gradients, partial participation, and compressed communication. We prove that
the new method has optimal oracle complexity and state-of-the-art communication
complexity in the partial participation setting. Regardless of the
communication compression feature, our method successfully combines variance
reduction and partial participation: we get the optimal oracle complexity,
never need the participation of all nodes, and do not require the bounded
gradients (dissimilarity) assumption.
( 2
min )
Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingly
popular in financial applications, owing to certain desirable properties that
it enjoys. We consider the problem of estimating UBSR in a recursive setting,
where samples from the underlying loss distribution are available
one-at-a-time. We cast the UBSR estimation problem as a root finding problem,
and propose stochastic approximation-based estimations schemes. We derive
non-asymptotic bounds on the estimation error in the number of samples. We also
consider the problem of UBSR optimization within a parameterized class of
random variables. We propose a stochastic gradient descent based algorithm for
UBSR optimization, and derive non-asymptotic bounds on its convergence.
( 2
min )
Artificial neural networks can be represented by paths. Generated as random
walks on a dense network graph, we find that the resulting sparse networks
allow for deterministic initialization and even weights with fixed sign. Such
networks can be trained sparse from scratch, avoiding the expensive procedure
of training a dense network and compressing it afterwards. Although sparse,
weights are accessed as contiguous blocks of memory. In addition, enumerating
the paths using deterministic low discrepancy sequences, for example the Sobol'
sequence, amounts to connecting the layers of neural units by progressive
permutations, which naturally avoids bank conflicts in parallel computer
hardware. We demonstrate that the artificial neural networks generated by low
discrepancy sequences can achieve an accuracy within reach of their dense
counterparts at a much lower computational complexity.
( 2
min )
In the multi-armed bandit framework, there are two formulations that are
commonly employed to handle time-varying reward distributions: adversarial
bandit and nonstationary bandit. Although their oracles, algorithms, and regret
analysis differ significantly, we provide a unified formulation in this paper
that smoothly bridges the two as special cases. The formulation uses an oracle
that takes the best-fixed arm within time windows. Depending on the window
size, it turns into the oracle in hindsight in the adversarial bandit and
dynamic oracle in the nonstationary bandit. We provide algorithms that attain
the optimal regret with the matching lower bound.
( 2
min )
Deep neural networks (DNNs), the agents of deep learning (DL), require a
massive number of parallel/sequential operations. This makes it difficult to
comprehend DNNs' operations and impedes proper diagnosis. Without better
knowledge of their internal process, deploying DNNs in high-stakes domains can
lead to catastrophic failures. Therefore, to build more reliable DNNs/DL to be
deployed in high-stakes real-world problems, it is imperative that we gain
insights into DNNs' internal operations underlying their decision-making. Here,
we use the self-organizing map (SOM) to analyze DL models' internal codes
associated with DNNs' decision-making. Our analyses suggest that shallow layers
close to the input layer compress features into condensed space and that deep
layers close to the output layer expand feature space. We also found evidence
indicating that compressed features may underlie DNNs' vulnerabilities to
adversarial perturbations.
( 2
min )
In a high-dimensional regression framework, we study consequences of the
naive two-step procedure where first the dimension of the input variables is
reduced and second, the reduced input variables are used to predict the output
variable with kernel regression. In order to analyze the resulting regression
errors, a novel stability result for kernel regression with respect to the
Wasserstein distance is derived. This allows us to bound errors that occur when
perturbed input data is used to fit the regression function. We apply the
general stability result to principal component analysis (PCA). Exploiting
known estimates from the literature on both principal component analysis and
kernel regression, we deduce convergence rates for the two-step procedure. The
latter turns out to be particularly useful in a semi-supervised setting.
( 2
min )
Linear regression is one of the most fundamental linear algebra problems.
Given a dense matrix $A \in \mathbb{R}^{n \times d}$ and a vector $b$, the goal
is to find $x'$ such that
$ \| Ax' - b \|_2^2 \leq (1+\epsilon) \min_{x} \| A x - b \|_2^2 $. The best
classical algorithm takes $O(nd) + \mathrm{poly}(d/\epsilon)$ time [Clarkson
and Woodruff STOC 2013, Nelson and Nguyen FOCS 2013]. On the other hand,
quantum linear regression algorithms can achieve exponential quantum speedups,
as shown in [Wang Phys. Rev. A 96, 012335, Kerenidis and Prakash ITCS 2017,
Chakraborty, Gily{\'e}n and Jeffery ICALP 2019]. However, the running times of
these algorithms depend on some quantum linear algebra-related parameters, such
as $\kappa(A)$, the condition number of $A$. In this work, we develop a quantum
algorithm that runs in $\widetilde{O}(\epsilon^{-1}\sqrt{n}d^{1.5}) +
\mathrm{poly}(d/\epsilon)$ time. It provides a quadratic quantum speedup in $n$
over the classical lower bound without any dependence on data-dependent
parameters. In addition, we also show our result can be generalized to multiple
regression and ridge linear regression.
( 2
min )
Mini-EUSO is a wide-angle fluorescence telescope that registers ultraviolet
(UV) radiation in the nocturnal atmosphere of Earth from the International
Space Station. Meteors are among multiple phenomena that manifest themselves
not only in the visible range but also in the UV. We present two simple
artificial neural networks that allow for recognizing meteor signals in the
Mini-EUSO data with high accuracy in terms of a binary classification problem.
We expect that similar architectures can be effectively used for signal
recognition in other fluorescence telescopes, regardless of the nature of the
signal. Due to their simplicity, the networks can be implemented in onboard
electronics of future orbital or balloon experiments.
( 3
min )
This document describes an approach used in the Multi-Machine Disruption
Prediction Challenge for Fusion Energy by ITU, a data science competition which
ran from September to November 2023, on the online platform Zindi. The
competition involved data from three fusion devices - C-Mod, HL-2A, and J-TEXT
- with most of the training data coming from the last two, and the test data
coming from the first one. Each device has multiple diagnostics and signals,
and it turns out that a critical issue in this competition was to identify
which signals, and especially which features from those signals, were most
relevant to achieve accurate predictions. The approach described here is based
on extracting features from signals, and then applying logistic regression on
top of those features. Each signal is treated as a separate predictor and, in
the end, a combination of such predictors achieved the first place on the
leaderboard.
( 2
min )
On dedicated analog hardware, equilibrium propagation is an energy-efficient
alternative to backpropagation. In spite of its theoretical guarantees, its
application in the AI domain remains limited to the discriminative setting.
Meanwhile, despite its high computational demands, generative AI is on the
rise. In this paper, we demonstrate the application of Equilibrium Propagation
in training a variational autoencoder (VAE) for generative modeling. Leveraging
the symmetric nature of Hopfield networks, we propose using a single model to
serve as both the encoder and decoder which could effectively halve the
required chip size for VAE implementations, paving the way for more efficient
analog hardware configurations.
( 2
min )
Although gradient descent with momentum is widely used in modern deep
learning, a concrete understanding of its effects on the training trajectory
still remains elusive. In this work, we empirically show that momentum gradient
descent with a large learning rate and learning rate warmup displays large
catapults, driving the iterates towards flatter minima than those found by
gradient descent. We then provide empirical evidence and theoretical intuition
that the large catapult is caused by momentum "amplifying" the
self-stabilization effect (Damian et al., 2023).
( 2
min )
A key problem in off-policy Reinforcement Learning (RL) is the mismatch, or
distribution shift, between the dataset and the distribution over states and
actions visited by the learned policy. This problem is exacerbated in the fully
offline setting. The main approach to correct this shift has been through
importance sampling, which leads to high-variance gradients. Other approaches,
such as conservatism or behavior-regularization, regularize the policy at the
cost of performance. In this paper, we propose a new approach for stable
off-policy Q-Learning. Our method, Projected Off-Policy Q-Learning (POP-QL), is
a novel actor-critic algorithm that simultaneously reweights off-policy samples
and constrains the policy to prevent divergence and reduce value-approximation
error. In our experiments, POP-QL not only shows competitive performance on
standard benchmarks, but also out-performs competing methods in tasks where the
data-collection policy is significantly sub-optimal.
( 2
min )
Foundation models, specifically Large Language Models (LLM's), have lately
gained wide-spread attention and adoption. Reinforcement Learning with Human
Feedback (RLHF) involves training a reward model to capture desired behaviors,
which is then used to align an LLM. These reward models are additionally used
at inference-time to estimate how well LLM responses adhere to those desired
behaviors. However, there is little work measuring how robust these reward
models are to distribution shifts. In this work, we evaluate how reward model
performance - measured via accuracy and calibration (i.e. alignment between
accuracy and confidence) - is affected by distribution shift. We show novel
calibration patterns and accuracy drops due to OOD prompts and responses, and
that the reward model is more sensitive to shifts in responses than prompts.
Additionally, we adapt an OOD detection technique commonly used in
classification to the reward model setting in order to detect these
distribution shifts in prompts and responses.
( 2
min )
In this research, we developed a graph-based framework to represent various
aspects of optimal thermal management system design, with the aim of rapidly
and efficiently identifying optimal design candidates. Initially, the
graph-based framework is utilized to generate diverse thermal management system
architectures. The dynamics of these system architectures are modeled under
various loading conditions, and an open-loop optimal controller is employed to
determine each system's optimal performance. These modeled cases constitute the
dataset, with the corresponding optimal performance values serving as the
labels for the data. In the subsequent step, a Graph Neural Network (GNN) model
is trained on 30% of the labeled data to predict the systems' performance,
effectively addressing a regression problem. Utilizing this trained model, we
estimate the performance values for the remaining 70% of the data, which serves
as the test set. In the third step, the predicted performance values are
employed to rank the test data, facilitating prioritized evaluation of the
design scenarios. Specifically, a small subset of the test data with the
highest estimated ranks undergoes evaluation via the open-loop optimal control
solver. This targeted approach concentrates on evaluating higher-ranked designs
identified by the GNN, replacing the exhaustive search (enumeration-based) of
all design cases. The results demonstrate a significant average reduction of
over 92% in the number of system dynamic modeling and optimal control analyses
required to identify optimal design scenarios.
( 3
min )
Since no solutions have been proposed in Colombia that seek to reduce the
consumption of electricity at the residential level, this paper describes the
design and implementation of a simple prototype of a low-cost home energy
management system (HEMS). The objective of this plat-form is to monitor the
energy consumption of typical household devices so that users can access the
consumption of each device separately and then establish the strategy that
allows them to reduce energy consumption at home. In order to demonstrate that
our system is viable, the system has been evaluated by measuring weekly energy
consumption with the on-line and off-line HEMS using a test bench with typical
household devices in a Sincelejo typical household. The evaluation has shown
that with the installation of this HEMS, consumption is reduced by 27%. This
shows that it is possible to achieve a good reduction percentage with a
low-cost system.
( 2
min )
This paper investigates an approach to both speed up business decision-making
and lower the cost of learning through experimentation by factorizing business
policies and employing fractional factorial experimental designs for their
evaluation. We illustrate how this method integrates with advances in the
estimation of heterogeneous treatment effects, elaborating on its advantages
and foundational assumptions. We empirically demonstrate the implementation and
benefits of our approach and assess its validity in evaluating consumer
promotion policies at DoorDash, which is one of the largest delivery platforms
in the US. Our approach discovers a policy with 5% incremental profit at 67%
lower implementation cost.
( 2
min )
There is growing concern that the potential of black box AI may exacerbate
health-related disparities and biases such as gender and ethnicity in clinical
decision-making. Biased decisions can arise from data availability and
collection processes, as well as from the underlying confounding effects of the
protected attributes themselves. This work proposes a machine learning-based
orthogonal approach aiming to analyze and suppress the effect of the confounder
through discriminant dimensionality reduction and orthogonalization of the
protected attributes against the primary attribute information. By doing so,
the impact of the protected attributes on disease diagnosis can be realized,
undesirable feature correlations can be mitigated, and the model prediction
performance can be enhanced.
( 2
min )
With the rise of Large Language Models (LLMs), notably characterized by GPT
frameworks, there emerges a catalyst for novel healthcare applications. Earlier
iterations of chatbot caregivers, though existent, have yet to achieve a
dimension of human-like authenticity. This paper unveils `MemoryCompanion' a
pioneering digital health solution explicitly tailored for Alzheimer's disease
(AD) patients and their caregivers. Drawing upon the nuances of GPT technology
and prompt engineering, MemoryCompanion manifests a personalized caregiving
paradigm, fostering interactions via voice-cloning and talking-face mechanisms
that resonate with the familiarity of known companions. Using advanced
prompt-engineering, the system intricately adapts to each patient's distinct
profile, curating its content and communication style accordingly. This
approach strives to counteract prevalent issues of social isolation and
loneliness frequently observed in AD demographics. Our methodology, grounded in
its innovative design, addresses both the caregiving and technological
challenges intrinsic to this domain.
( 2
min )
In this work, we present a method to generate a configurational level
fingerprint for polymers using the Bead-Spring-Model. Unlike some of the
previous fingerprinting approaches that employ monomer-level information where
atomistic descriptors are computed using quantum chemistry calculations, this
approach incorporates configurational information from a coarse-grained model
of a long polymer chain. The proposed approach may be advantageous for the
study of behavior resulting from large molecular weights. To create this
fingerprint, we make use of two kinds of descriptors. First, we calculate
certain geometric descriptors like Re2, Rg2 etc. and label them as Calculated
Descriptors. Second, we generate a set of data-driven descriptors using an
unsupervised autoencoder model and call them Learnt Descriptors. Using a
combination of both of them, we are able to learn mappings from the structure
to various properties of the polymer chain by training ML models. We test our
fingerprint to predict the probability of occurrence of a configuration at
equilibrium, which is approximated by a simple linear relationship between the
instantaneous internal energy and equilibrium average internal energy.
( 2
min )
Through the advancement in natural language processing (NLP), specifically in
speech recognition, fully automated complex systems functioning on voice input
have started proliferating in areas such as home automation. These systems have
been termed Automatic Speech Recognition Systems (ASR). In this review paper,
we explore the feasibility of an end-to-end system providing speech and text
based natural language processing for job interview preparation as well as
recommendation of relevant job postings. We also explore existing
recommender-based systems and note their limitations. This literature review
would help us identify the approaches and limitations of the various similar
use-cases of NLP technology for our upcoming project.
( 2
min )
Amazon Web Services and NVIDIA will bring the latest generative AI technologies to enterprises worldwide. Combining AI and cloud computing, NVIDIA founder and CEO Jensen Huang joined AWS CEO Adam Selipsky Tuesday on stage at AWS re:Invent 2023 at the Venetian Expo Center in Las Vegas. Selipsky said he was “thrilled” to announce the expansion Read article >
( 6
min )
Researchers and developers at leading pharmaceutical and techbio companies can now easily deploy NVIDIA Clara software and services for accelerated healthcare through Amazon Web Services. Announced today at AWS re:Invent, the initiative gives healthcare and life sciences developers using AWS cloud resources the flexibility to integrate NVIDIA-accelerated offerings such as NVIDIA BioNeMo — a generative Read article >
( 6
min )
Developing more intelligent robots in the cloud is about to get a speed multiplier. NVIDIA Isaac Sim and NVIDIA L40S GPUs are coming to Amazon Web Services, enabling developers to build and deploy accelerated robotics applications in the cloud. Isaac Sim, an extensible simulator for AI-enabled robots, is built on the NVIDIA Omniverse development platform Read article >
( 6
min )
Everything about large language models is big — giant models train on massive datasets across thousands of NVIDIA GPUs. That can pose a lot of big challenges for companies pursuing generative AI. NVIDIA NeMo, a framework for building, customizing and running LLMs, helps overcome these challenges. A team of experienced scientists and developers at Amazon Read article >
( 5
min )
This week’s talented In the NVIDIA Studio artist, Nourhan Ismail, created a literal NVIDIA studio.
( 7
min )
The immediate and pressing need for ‘digitizing’ your supply-chain One may conclude: ‘Digitizing’ the supply-chain has become a survival necessity for companies to stay competitive. Apart from a substantial jump in the efficiency-effectiveness, the customer-experience, and upside to revenues, companies can expect a huge-huge cost-saving… A Look at the Future: Components of Data-driven (Digital) Supply-chain… Read More »Data-driven, AI-powered supply chain part 3: Imagining the Future – Supply chain 5.0
The post Data-driven, AI-powered supply chain part 3: Imagining the Future – Supply chain 5.0 appeared first on Data Science Central.
( 25
min )
The viability of the ‘Viable Vision’. I did hear about the Theory of Constraints (TOC) off and on through the late 90s, but I didn’t pay much attention until late 2001. One of the i2 consultants I met at their annual meet in Malaysia had one too many- and ended up lecturing me on how… Read More »Data-driven supply chain part 2: The theory of constraints & the concept of the information supply chain.
The post Data-driven supply chain part 2: The theory of constraints & the concept of the information supply chain. appeared first on Data Science Central.
( 28
min )
While the world is going wild over the potential benefits of generative AI, there’s little attention paid to the data deployed to build and operate these tools. Let’s look at a few examples to explore what’s involved in determining data use, and why this matters for end users as well as operators. Text-based generative AI… Read More »Here’s How Much Data Gets Used By Generative AI Tools For Each Request
The post Here’s How Much Data Gets Used By Generative AI Tools For Each Request appeared first on Data Science Central.
( 21
min )
Earlier in the fall, Charles Hoffman joined our non-profit Dataworthy Collective (DC) that focuses on best practices in trusted knowledge graph development. Hoffman is a CPA, consultant and former PwC auditor who works with clients who use the Extensible Business Reporting Language (XBRL). For those who don’t know the history of standard digital business reporting,… Read More »Trusted, automated data sharing across spreadsheets and other documents
The post Trusted, automated data sharing across spreadsheets and other documents appeared first on Data Science Central.
( 20
min )
Learning unsupervised world models for autonomous driving has the potential
to improve the reasoning capabilities of today's systems dramatically. However,
most work neglects the physical attributes of the world and focuses on sensor
data alone. We propose MUVO, a MUltimodal World Model with Geometric VOxel
Representations to address this challenge. We utilize raw camera and lidar data
to learn a sensor-agnostic geometric representation of the world, which can
directly be used by downstream tasks, such as planning. We demonstrate
multimodal future predictions and show that our geometric representation
improves the prediction quality of both camera images and lidar point clouds.
( 2
min )
In echocardiographic view classification, accurately detecting
out-of-distribution (OOD) data is essential but challenging, especially given
the subtle differences between in-distribution and OOD data. While conventional
OOD detection methods, such as Mahalanobis distance (MD) are effective in
far-OOD scenarios with clear distinctions between distributions, they struggle
to discern the less obvious variations characteristic of echocardiographic
data. In this study, we introduce a novel use of label smoothing to enhance
semantic feature representation in echocardiographic images, demonstrating that
these enriched semantic features are key for significantly improving near-OOD
instance detection. By combining label smoothing with MD-based OOD detection,
we establish a new benchmark for accuracy in echocardiographic OOD detection.
( 2
min )
Tabular data is hard to acquire and is subject to missing values. This paper
proposes a novel approach to generate and impute mixed-type (continuous and
categorical) tabular data using score-based diffusion and conditional flow
matching. Contrary to previous work that relies on neural networks to learn the
score function or the vector field, we instead rely on XGBoost, a popular
Gradient-Boosted Tree (GBT) method. We empirically show on 27 different
datasets that our approach i) generates highly realistic synthetic data when
the training dataset is either clean or tainted by missing data and ii)
generates diverse plausible data imputations. Furthermore, our method
outperforms deep-learning generation methods on data generation and is
competitive on data imputation. Finally, it can be trained in parallel using
CPUs without the need for a GPU. To make it easily accessible, we release our
code through a Python library and an R package.
( 2
min )
A common forecasting setting in real world applications considers a set of
possibly heterogeneous time series of the same domain. Due to different
properties of each time series such as length, obtaining forecasts for each
individual time series in a straight-forward way is challenging. This paper
proposes a general framework utilizing a similarity measure in Dynamic Time
Warping to find similar time series to build neighborhoods in a k-Nearest
Neighbor fashion, and improve forecasts of possibly simple models by averaging.
Several ways of performing the averaging are suggested, and theoretical
arguments underline the usefulness of averaging for forecasting. Additionally,
diagnostics tools are proposed allowing a deep understanding of the procedure.
( 2
min )
Recent results show that estimates defined by over-parametrized deep neural
networks learned by applying gradient descent to a regularized empirical $L_2$
risk are universally consistent and achieve good rates of convergence. In this
paper, we show that the regularization term is not necessary to obtain similar
results. In the case of a suitably chosen initialization of the network, a
suitable number of gradient descent steps, and a suitable step size we show
that an estimate without a regularization term is universally consistent for
bounded predictor variables. Additionally, we show that if the regression
function is H\"older smooth with H\"older exponent $1/2 \leq p \leq 1$, the
$L_2$ error converges to zero with a convergence rate of approximately
$n^{-1/(1+d)}$. Furthermore, in case of an interaction model, where the
regression function consists of a sum of H\"older smooth functions with $d^*$
components, a rate of convergence is derived which does not depend on the input
dimension $d$.
( 2
min )
Amazon Elastic Compute Cloud (Amazon EC2) accelerated computing portfolio offers the broadest choice of accelerators to power your artificial intelligence (AI), machine learning (ML), graphics, and high performance computing (HPC) workloads. We are excited to announce the expansion of this portfolio with three new instances featuring the latest NVIDIA GPUs: Amazon EC2 P5e instances powered […]
( 4
min )
Today, Amazon SageMaker launches a new version (0.25.0) of Large Model Inference (LMI) Deep Learning Containers (DLCs) and adds support for NVIDIA’s TensorRT-LLM Library. With these upgrades, you can effortlessly access state-of-the-art tooling to optimize large language models (LLMs) on SageMaker and achieve price-performance benefits – Amazon SageMaker LMI TensorRT-LLM DLC reduces latency by 33% […]
( 9
min )
Generative artificial intelligence (generative AI) models have demonstrated impressive capabilities in generating high-quality text, images, and other content. However, these models require massive amounts of clean, structured training data to reach their full potential. Most real-world data exists in unstructured formats like PDFs, which requires preprocessing before it can be used effectively. According to IDC, […]
( 10
min )
This post is co-authored by Daryl Martis, Director of Product, Salesforce Einstein AI. This is the third post in a series discussing the integration of Salesforce Data Cloud and Amazon SageMaker. In Part 1 and Part 2, we show how the Salesforce Data Cloud and Einstein Studio integration with SageMaker allows businesses to access their […]
( 12
min )
Artificial intelligence (AI) continues to transform how we do business and serve our customers. AWS offers a range of pre-trained AI services that provide ready-to-use intelligence for your applications. In this post, we explore the new AI service capabilities and how they are enhanced using foundation models (FMs). We focus on the following major updates […]
( 7
min )
In this post, we talk about how generative AI is changing the conversational AI industry by providing new customer and bot builder experiences, and the new features in Amazon Lex that take advantage of these advances. As the demand for conversational AI continues to grow, developers are seeking ways to enhance their chatbots with human-like […]
( 7
min )
Human Guided Exploration (HuGE) enables AI agents to learn quickly with some help from humans, even if the humans make mistakes.
( 11
min )
Amazon Transcribe is a fully managed automatic speech recognition (ASR) service that makes it straightforward for you to add speech-to-text capabilities to your applications. Today, we are happy to announce a next-generation multi-billion parameter speech foundation model-powered system that expands automatic speech recognition to over 100 languages. In this post, we discuss some of the […]
( 7
min )
Today, we are excited to announce three launches that will help you enhance personalized customer experiences using Amazon Personalize and generative AI. Whether you’re looking for a managed solution or build your own, you can use these new capabilities to power your journey. Amazon Personalize is a fully managed machine learning (ML) service that makes […]
( 8
min )
Amazon Personalize is excited to announce the new Next Best Action (aws-next-best-action) recipe to help you determine the best actions to suggest to your individual users that will enable you to increase brand loyalty and conversion. Amazon Personalize is a fully managed machine learning (ML) service that makes it effortless for developers to deliver highly […]
( 8
min )
NVIDIA today launched a cloud service for medical imaging AI to further streamline and accelerate the creation of ground-truth data and training of specialized AI models through fully managed, cloud-based application programming interfaces. NVIDIA MONAI cloud APIs — announced at the annual meeting of RSNA, the Radiological Society of North America, taking place this week Read article >
( 7
min )
This post is co-written with Marc Neumann, Amor Steinberg and Marinus Krommenhoek from BMW Group. The BMW Group – headquartered in Munich, Germany – is driven by 149,000 employees worldwide and manufactures in over 30 production and assembly facilities across 15 countries. Today, the BMW Group is the world’s leading manufacturer of premium automobiles and […]
( 11
min )
In today’s ever-evolving world of ecommerce, the influence of a compelling product description cannot be overstated. It can be the decisive factor that turns a potential visitor into a paying customer or sends them clicking off to a competitor’s site. The manual creation of these descriptions across a vast array of products is a labor-intensive […]
( 9
min )
Amazon SageMaker Canvas is a rich, no-code Machine Learning (ML) and Generative AI workspace that has allowed customers all over the world to more easily adopt ML technologies to solve old and new challenges thanks to its visual, no-code interface. It does so by covering the ML workflow end-to-end: whether you’re looking for powerful data […]
( 9
min )
This post was co-written with Greg Benson, Chief Scientist; Aaron Kesler, Sr. Product Manager; and Rich Dill, Enterprise Solutions Architect from SnapLogic. Many customers are building generative AI apps on Amazon Bedrock and Amazon CodeWhisperer to create code artifacts based on natural language. This use case highlights how large language models (LLMs) are able to […]
( 17
min )
As a surrogate for computationally intensive meso-scale simulation of woven
composites, this article presents Recurrent Neural Network (RNN) models.
Leveraging the power of transfer learning, the initialization challenges and
sparse data issues inherent in cyclic shear strain loads are addressed in the
RNN models. A mean-field model generates a comprehensive data set representing
elasto-plastic behavior. In simulations, arbitrary six-dimensional strain
histories are used to predict stresses under random walking as the source task
and cyclic loading conditions as the target task. Incorporating sub-scale
properties enhances RNN versatility. In order to achieve accurate predictions,
the model uses a grid search method to tune network architecture and
hyper-parameter configurations. The results of this study demonstrate that
transfer learning can be used to effectively adapt the RNN to varying strain
conditions, which establishes its potential as a useful tool for modeling
path-dependent responses in woven composites.
( 2
min )
In safety-critical domains such as autonomous driving and medical diagnosis,
the reliability of machine learning models is crucial. One significant
challenge to reliability is concept drift, which can cause model deterioration
over time. Traditionally, drift detectors rely on true labels, which are often
scarce and costly. This study conducts a comprehensive empirical evaluation of
using uncertainty values as substitutes for error rates in detecting drifts,
aiming to alleviate the reliance on labeled post-deployment data. We examine
five uncertainty estimation methods in conjunction with the ADWIN detector
across seven real-world datasets. Our results reveal that while the SWAG method
exhibits superior calibration, the overall accuracy in detecting drifts is not
notably impacted by the choice of uncertainty estimation method, with even the
most basic method demonstrating competitive performance. These findings offer
valuable insights into the practical applicability of uncertainty-based drift
detection in real-world, safety-critical applications.
( 2
min )
This paper introduces a new model to generate rhythmically relevant
non-verbal facial behaviors for virtual agents while they speak. The model
demonstrates perceived performance comparable to behaviors directly extracted
from the data and replayed on a virtual agent, in terms of synchronization with
speech and believability. Interestingly, we found that training the model with
two different sets of data, instead of one, did not necessarily improve its
performance. The expressiveness of the people in the dataset and the shooting
conditions are key elements. We also show that employing an adversarial model,
in which fabricated fake examples are introduced during the training phase,
increases the perception of synchronization with speech. A collection of videos
demonstrating the results and code can be accessed at:
https://github.com/aldelb/non_verbal_facial_animation.
( 2
min )
Due to its predominantly asymptomatic or mildly symptomatic progression, lung
cancer is often diagnosed in advanced stages, resulting in poorer survival
rates for patients. As with other cancers, early detection significantly
improves the chances of successful treatment. Early diagnosis can be
facilitated through screening programs designed to detect lung tissue tumors
when they are still small, typically around 3mm in size. However, the analysis
of extensive screening program data is hampered by limited access to medical
experts. In this study, we developed a procedure for identifying potential
malignant neoplastic lesions within lung parenchyma. The system leverages
machine learning (ML) techniques applied to two types of measurements: low-dose
Computed Tomography-based radiomics and metabolomics. Using data from two
Polish screening programs, two ML algorithms were tested, along with various
integration methods, to create a final model that combines both modalities to
support lung cancer screening.
( 2
min )
This manuscript presents an advanced framework for Bayesian learning by
incorporating action and state-dependent signal variances into decision-making
models. This framework is pivotal in understanding complex data-feedback loops
and decision-making processes in various economic systems. Through a series of
examples, we demonstrate the versatility of this approach in different
contexts, ranging from simple Bayesian updating in stable environments to
complex models involving social learning and state-dependent uncertainties. The
paper uniquely contributes to the understanding of the nuanced interplay
between data, actions, outcomes, and the inherent uncertainty in economic
models.
( 2
min )
Convolutional Neural Networks (CNNs) have greatly influenced the field of
Embedded Vision and Edge Artificial Intelligence (AI), enabling powerful
machine learning capabilities on resource-constrained devices. This article
explores the relationship between CNN compute requirements and memory bandwidth
in the context of Edge AI. We delve into the historical progression of CNN
architectures, from the early pioneering models to the current state-of-the-art
designs, highlighting the advancements in compute-intensive operations. We
examine the impact of increasing model complexity on both computational
requirements and memory access patterns. The paper presents a comparison
analysis of the evolving trade-off between compute demands and memory bandwidth
requirements in CNNs. This analysis provides insights into designing efficient
architectures and potential hardware accelerators in enhancing CNN performance
on edge devices.
( 2
min )
Bowers and colleagues argue that DNNs are poor models of biological vision
because they often learn to rival human accuracy by relying on strategies that
differ markedly from those of humans. We show that this problem is worsening as
DNNs are becoming larger-scale and increasingly more accurate, and prescribe
methods for building DNNs that can reliably model biological vision.
( 2
min )
Robotic capacities in object manipulation are incomparable to those of
humans. Besides years of learning, humans rely heavily on the richness of
information from physical interaction with the environment. In particular,
tactile sensing is crucial in providing such rich feedback. Despite its
potential contributions to robotic manipulation, tactile sensing is less
exploited; mainly due to the complexity of the time series provided by tactile
sensors. In this work, we propose a method for assessing grasp stability using
tactile sensing. More specifically, we propose a methodology to extract
task-relevant features and design efficient classifiers to detect object
slippage with respect to individual fingertips. We compare two classification
models: support vector machine and logistic regression. We use highly sensitive
Uskin tactile sensors mounted on an Allegro hand to test and validate our
method. Our results demonstrate that the proposed method is effective in
slippage detection in an online fashion.
( 2
min )
Learning and forecasting stochastic time series is essential in various
scientific fields. However, despite the proposals of nonlinear filters and
deep-learning methods, it remains challenging to capture nonlinear dynamics
from a few noisy samples and predict future trajectories with uncertainty
estimates while maintaining computational efficiency. Here, we propose a fast
algorithm to learn and forecast nonlinear dynamics from noisy time series data.
A key feature of the proposed model is kernel functions applied to projected
lines, enabling fast and efficient capture of nonlinearities in the latent
dynamics. Through empirical case studies and benchmarking, the model
demonstrates its effectiveness in learning and forecasting complex nonlinear
dynamics, offering a valuable tool for researchers and practitioners in time
series analysis.
( 2
min )
Working with multiple variables they usually contain difficult to control
complex dependencies. This article proposes extraction of their individual
information, e.g. $\overline{X|Y}$ as random variable containing information
from $X$, but with removed information about $Y$, by using $(x,y)
\leftrightarrow (\bar{x}=\textrm{CDF}_{X|Y=y}(x),y)$ reversible normalization.
One application can be decoupling of individual information of variables:
reversibly transform $(X_1,\ldots,X_n)\leftrightarrow(\tilde{X}_1,\ldots
\tilde{X}_n)$ together containing the same information, but being independent:
$\forall_{i\neq j} \tilde{X}_i\perp \tilde{X}_j, \tilde{X}_i\perp X_j$. It
requires detailed models of complex conditional probability distributions - it
is generally a difficult task, but here can be done through multiple dependency
reducing iterations, using imperfect methods (here HCR: Hierarchical
Correlation Reconstruction). It could be also used for direct mutual
information - evaluating direct information transfer: without use of
intermediate variables. For causality direction there is discussed
multi-feature Granger causality, e.g. to trace various types of individual
information transfers between such decoupled variables, including propagation
time (delay).
( 2
min )
Multi-objective optimization (MOO) aims to optimize multiple, possibly
conflicting objectives with widespread applications. We introduce a novel
interacting particle method for MOO inspired by molecular dynamics simulations.
Our approach combines overdamped Langevin and birth-death dynamics,
incorporating a "dominance potential" to steer particles toward global Pareto
optimality. In contrast to previous methods, our method is able to relocate
dominated particles, making it particularly adept at managing Pareto fronts of
complicated geometries. Our method is also theoretically grounded as a
Wasserstein-Fisher-Rao gradient flow with convergence guarantees. Extensive
experiments confirm that our approach outperforms state-of-the-art methods on
challenging synthetic and real-world datasets.
( 2
min )
By analyzing bacterial data, researchers have discovered thousands of rare new CRISPR systems that have a range of functions and could enable gene editing, diagnostics, and more.
( 10
min )
GeForce NOW is bringing 18 new games to the cloud this week, part of a gratitude-filled GFN Thursday. A collaboration between Chromebook Plus, CD PROJEKT RED and GeForce NOW brought an immersive 3D activation to Times Square over the weekend, containing a hidden Easter egg for Cyberpunk 2077 players. Plus, this holiday season, give the Read article >
( 5
min )
In this article, we consider the problem of approximating a finite set of
data (usually huge in applications) by invariant subspaces generated through a
small set of smooth functions. The invariance is either by translations under a
full-rank lattice or through the action of crystallographic groups. Smoothness
is ensured by stipulating that the generators belong to a Paley-Wiener space,
that is selected in an optimal way based on the characteristics of the given
data. To complete our investigation, we analyze the fundamental role played by
the lattice in the process of approximation.
( 2
min )
We study the problem of solving strongly convex and smooth unconstrained
optimization problems using stochastic first-order algorithms. We devise a
novel algorithm, referred to as \emph{Recursive One-Over-T SGD} (\ROOTSGD),
based on an easily implementable, recursive averaging of past stochastic
gradients. We prove that it simultaneously achieves state-of-the-art
performance in both a finite-sample, nonasymptotic sense and an asymptotic
sense. On the nonasymptotic side, we prove risk bounds on the last iterate of
\ROOTSGD with leading-order terms that match the optimal statistical risk with
a unity pre-factor, along with a higher-order term that scales at the sharp
rate of $O(n^{-3/2})$ under the Lipschitz condition on the Hessian matrix. On
the asymptotic side, we show that when a mild, one-point Hessian continuity
condition is imposed, the rescaled last iterate of (multi-epoch) \ROOTSGD
converges asymptotically to a Gaussian limit with the Cram\'{e}r-Rao optimal
asymptotic covariance, for a broad range of step-size choices.
( 2
min )
Machine Learning (ML) and Algorithmic Information Theory (AIT) look at
Complexity from different points of view. We explore the interface between AIT
and Kernel Methods (that are prevalent in ML) by adopting an AIT perspective on
the problem of learning kernels from data, in kernel ridge regression, through
the method of Sparse Kernel Flows. In particular, by looking at the differences
and commonalities between Minimal Description Length (MDL) and Regularization
in Machine Learning (RML), we prove that the method of Sparse Kernel Flows is
the natural approach to adopt to learn kernels from data. This paper shows that
it is not necessary to use the statistical route to derive Sparse Kernel Flows
and that one can directly work with code-lengths and complexities that are
concepts that show up in AIT.
( 2
min )
We introduce a new Langevin dynamics based algorithm, called
e-TH$\varepsilon$O POULA, to solve optimization problems with discontinuous
stochastic gradients which naturally appear in real-world applications such as
quantile estimation, vector quantization, CVaR minimization, and regularized
optimization problems involving ReLU neural networks. We demonstrate both
theoretically and numerically the applicability of the e-TH$\varepsilon$O POULA
algorithm. More precisely, under the conditions that the stochastic gradient is
locally Lipschitz in average and satisfies a certain convexity at infinity
condition, we establish non-asymptotic error bounds for e-TH$\varepsilon$O
POULA in Wasserstein distances and provide a non-asymptotic estimate for the
expected excess risk, which can be controlled to be arbitrarily small. Three
key applications in finance and insurance are provided, namely, multi-period
portfolio optimization, transfer learning in multi-period portfolio
optimization, and insurance claim prediction, which involve neural networks
with (Leaky)-ReLU activation functions. Numerical experiments conducted using
real-world datasets illustrate the superior empirical performance of
e-TH$\varepsilon$O POULA compared to SGLD, TUSLA, ADAM, and AMSGrad in terms of
model accuracy.
( 2
min )
Networks are ubiquitous in many real-world applications (e.g., social
networks encoding trust/distrust relationships, correlation networks arising
from time series data). While many networks are signed or directed, or both,
there is a lack of unified software packages on graph neural networks (GNNs)
specially designed for signed and directed networks. In this paper, we present
PyTorch Geometric Signed Directed (PyGSD), a software package which fills this
gap. Along the way, we evaluate the implemented methods with experiments with a
view to providing insights into which method to choose for a given task. The
deep learning framework consists of easy-to-use GNN models, synthetic and
real-world data, as well as task-specific evaluation metrics and loss functions
for signed and directed networks. As an extension library for PyG, our proposed
software is maintained with open-source releases, detailed documentation,
continuous integration, unit tests and code coverage checks. The GitHub
repository of the library is
https://github.com/SherylHYX/pytorch_geometric_signed_directed.
( 3
min )
Sequential neural posterior estimation (SNPE) techniques have been recently
proposed for dealing with simulation-based models with intractable likelihoods.
Unlike approximate Bayesian computation, SNPE techniques learn the posterior
from sequential simulation using neural network-based conditional density
estimators. This paper reclaims SNPE-B proposed by Lueckmann et al. (2017),
which suffers from inefficiency and slow inference due to inefficient
utilization of simulated data and high variance of parameter updates. To
address these issues, we firstly introduce a concentrated loss function based
on an adaptive calibration kernel that reweights the simulated data
appropriately to improve the data efficiency. Moreover, we provide a
theoretical analysis of the variance of associated Monte Carlo estimators.
Based on this analysis, we then propose several variance reduction techniques
to further accelerate the process of learning. Numerical experiments
demonstrate that our method outperforms the original method together with other
existing competitors on certain tasks.
( 2
min )
In real-world reinforcement learning problems, the state information is often
only partially observable, which breaks the basic assumption in Markov decision
processes, and thus, leads to inferior performances. Partially Observable
Markov Decision Processes have been introduced to explicitly take the issue
into account for learning, exploration, and planning, but presenting
significant computational and statistical challenges. To address these
difficulties, we exploit the representation view, which leads to a coherent
design framework for a practically tractable reinforcement learning algorithm
upon partial observations. We provide a theoretical analysis for justifying the
statistical efficiency of the proposed algorithm. We also empirically
demonstrate the proposed algorithm can surpass state-of-the-art performance
with partial observations across various benchmarks, therefore, pushing
reliable reinforcement learning towards more practical applications.
( 2
min )
This is a guest post by A.K Roy from Qualcomm AI. Amazon Elastic Compute Cloud (Amazon EC2) DL2q instances, powered by Qualcomm AI 100 Standard accelerators, can be used to cost-efficiently deploy deep learning (DL) workloads in the cloud. They can also be used to develop and validate performance and accuracy of DL workloads that […]
( 9
min )
The financial service (FinServ) industry has unique generative AI requirements related to domain-specific data, data security, regulatory controls, and industry compliance standards. In addition, customers are looking for choices to select the most performant and cost-effective machine learning (ML) model and the ability to perform necessary customization (fine-tuning) to fit their business use cases. Amazon […]
( 11
min )
The IDP Well-Architected Lens is intended for all AWS customers who use AWS to run intelligent document processing (IDP) solutions and are searching for guidance on how to build secure, efficient, and reliable IDP solutions on AWS. Building a production-ready solution in the cloud involves a series of trade-offs between resources, time, customer expectation, and […]
( 14
min )
Building a production-ready solution in AWS involves a series of trade-offs between resources, time, customer expectation, and business outcome. The AWS Well-Architected Framework helps you understand the benefits and risks of decisions you make while building workloads on AWS. By using the Framework, you will learn current operational and architectural recommendations for designing and operating […]
( 11
min )
The IDP Well-Architected Custom Lens is intended for all AWS customers who use AWS to run intelligent document processing (IDP) solutions and are searching for guidance on how to build a secure, efficient, and reliable IDP solution on AWS. Building a production-ready solution in the cloud involves a series of trade-offs between resources, time, customer […]
( 13
min )
When a customer has a production-ready intelligent document processing (IDP) workload, we often receive requests for a Well-Architected review. To build an enterprise solution, developer resources, cost, time and user-experience have to be balanced to achieve the desired business outcome. The AWS Well-Architected Framework provides a systematic way for organizations to learn operational and architectural […]
( 10
min )
Building a production-ready solution in the cloud involves a series of trade-off between resources, time, customer expectation, and business outcome. The AWS Well-Architected Framework helps you understand the benefits and risks of decisions you make while building workloads on AWS. An intelligent document processing (IDP) project usually combines optical character recognition (OCR) and natural language […]
( 13
min )
An intelligent document processing (IDP) project typically combines optical character recognition (OCR) and natural language processing (NLP) to automatically read and understand documents. Customers across all industries run IDP workloads on AWS to deliver business value by automating use cases such as KYC forms, tax documents, invoices, insurance claims, delivery reports, inventory reports, and more. […]
( 11
min )
For decades, Amazon has pioneered and innovated machine learning (ML), bringing delightful experiences to its customers. From the earliest days, Amazon has used ML for various use cases such as book recommendations, search, and fraud detection. Similar to the rest of the industry, the advancements of accelerated hardware have allowed Amazon teams to pursue model […]
( 11
min )
Today, geospatial workflows typically consist of loading data, transforming it, and then producing visual insights like maps, text, or charts. Generative AI can automate these tasks through autonomous agents. In this post, we discuss how to use foundation models from Amazon Bedrock to power agents to complete geospatial tasks. These agents can perform various tasks […]
( 11
min )
A new deep-learning compiler for dynamic sparsity; Tongue Tap could make tongue gestures viable for VR/AR headsets; Ranking LLM-Generated Loop Invariants for Program Verification; Assessing the limits of zero-shot foundation models in single-cell biology.
The post Research Focus: Week of November 22, 2023 appeared first on Microsoft Research.
( 10
min )
A calendar packed with meetings, calls and lab visits may sound like a typical workday for many — but for Luca Lofranco, whose greatest wish was to experience what it’s like to work at NVIDIA, it was a dream come true. Eighteen-year-old Lofranco recently traveled from his hometown near Toronto, Canada, to spend the day Read article >
( 6
min )
Talk about going after low-hanging fruit. Afresh is an AI startup that helps grocery stores and retailers reduce food waste by making supply chains more efficient. In the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with the company’s cofounder and president, Nathan Fenner, about its mission, offerings and the greater challenge of Read article >
( 5
min )
AI-based medical technologies, including wearables, telemedicine, LLMs, and
digital care twins, significantly impact healthcare. Ensuring AI results are
accurate and interpretable is crucial, especially for clinicians. This paper
reviews processes and challenges of interpretable ML (IML) and explainable AI
(XAI) in healthcare. Objectives include reviewing XAI processes, methods,
applications, and challenges, with a focus on quality control. The IML process
is classified into data pre-processing interpretability, interpretable
modeling, and post-processing interpretability. The paper aims to establish the
importance of robust interpretability in healthcare through experimental
results, providing insights for creating communicable clinician-AI tools.
Research questions, eligibility criteria, and goals were identified following
PRISMA and PICO methods. PubMed, Scopus, and Web of Science were systematically
searched using specific strings. The survey introduces a step-by-step roadmap
for implementing XAI in clinical applications, addressing existing gaps and
acknowledging XAI model limitations.
( 2
min )
This study introduces a novel forecasting strategy that leverages the power
of fractional differencing (FD) to capture both short- and long-term
dependencies in time series data. Unlike traditional integer differencing
methods, FD preserves memory in series while stabilizing it for modeling
purposes. By applying FD to financial data from the SPY index and incorporating
sentiment analysis from news reports, this empirical analysis explores the
effectiveness of FD in conjunction with binary classification of target
variables. Supervised classification algorithms were employed to validate the
performance of FD series. The results demonstrate the superiority of FD over
integer differencing, as confirmed by Receiver Operating Characteristic/Area
Under the Curve (ROCAUC) and Mathews Correlation Coefficient (MCC) evaluations.
( 2
min )
The infinitely wide neural network has been proven a useful and manageable
mathematical model that enables the understanding of many phenomena appearing
in deep learning. One example is the convergence of random deep networks to
Gaussian processes that allows a rigorous analysis of the way the choice of
activation function and network weights impacts the training dynamics. In this
paper, we extend the seminal proof of Matthews et al. (2018) to a larger class
of initial weight distributions (which we call PSEUDO-IID), including the
established cases of IID and orthogonal weights, as well as the emerging
low-rank and structured sparse settings celebrated for their computational
speed-up benefits. We show that fully-connected and convolutional networks
initialized with PSEUDO-IID distributions are all effectively equivalent up to
their variance. Using our results, one can identify the Edge-of-Chaos for a
broader class of neural networks and tune them at criticality in order to
enhance their training.
( 2
min )
Metarounding is an approach to convert an approximation algorithm for linear
optimization over some combinatorial classes to an online linear optimization
algorithm for the same class. We propose a new metarounding algorithm under a
natural assumption that a relax-based approximation algorithm exists for the
combinatorial class. Our algorithm is much more efficient in both theoretical
and practical aspects.
( 2
min )
Text-based game environments are challenging because agents must deal with
long sequences of text, execute compositional actions using text and learn from
sparse rewards. We address these challenges by proposing Language Decision
Transformers (LDTs), a framework that is based on transformer language models
and decision transformers (DTs). Our LDTs extend DTs with 3 components: (1)
exponential tilt to guide the agent towards high obtainable goals, (2) novel
goal conditioning methods yielding better results than the traditional
return-to-go (sum of all future rewards), and (3) a model of future
observations that improves agent performance. LDTs are the first to address
offline RL with DTs on these challenging games. Our experiments show that LDTs
achieve the highest scores among many different types of agents on some of the
most challenging Jericho games, such as Enchanter.
( 2
min )
There is no convincing evidence that backpropagation is a biologically
plausible mechanism, and further studies of alternative learning methods are
needed. A novel online clustering algorithm is presented that can produce
arbitrary shaped clusters from inputs in an unsupervised manner, and requires
no prior knowledge of the number of clusters in the input data. This is
achieved by finding correlated outputs from functions that capture commonly
occurring input patterns. The algorithm can be deemed more biologically
plausible than model optimization through backpropagation, although practical
applicability may require additional research. However, the method yields
satisfactory results on several toy datasets on a noteworthy range of
hyperparameters.
( 2
min )
Successful deployment of multi-agent reinforcement learning often requires
agents to adapt their behaviour. In this work, we discuss the problem of
teamwork adaptation in which a team of agents needs to adapt their policies to
solve novel tasks with limited fine-tuning. Motivated by the intuition that
agents need to be able to identify and distinguish tasks in order to adapt
their behaviour to the current task, we propose to learn multi-agent task
embeddings (MATE). These task embeddings are trained using an encoder-decoder
architecture optimised for reconstruction of the transition and reward
functions which uniquely identify tasks. We show that a team of agents is able
to adapt to novel tasks when provided with task embeddings. We propose three
MATE training paradigms: independent MATE, centralised MATE, and mixed MATE
which vary in the information used for the task encoding. We show that the
embeddings learned by MATE identify tasks and provide useful information which
agents leverage during adaptation to novel tasks.
( 2
min )
In this paper, we introduce a novel and computationally efficient method for
vertex embedding, community detection, and community size determination. Our
approach leverages a normalized one-hot graph encoder and a rank-based cluster
size measure. Through extensive simulations, we demonstrate the excellent
numerical performance of our proposed graph encoder ensemble algorithm.
( 2
min )
In the present work, we introduce a novel approach to enhance the precision
of reduced order models by exploiting a multi-fidelity perspective and
DeepONets. Reduced models provide a real-time numerical approximation by
simplifying the original model. The error introduced by the such operation is
usually neglected and sacrificed in order to reach a fast computation. We
propose to couple the model reduction to a machine learning residual learning,
such that the above-mentioned error can be learned by a neural network and
inferred for new predictions. We emphasize that the framework maximizes the
exploitation of high-fidelity information, using it for building the reduced
order model and for learning the residual. In this work, we explore the
integration of proper orthogonal decomposition (POD), and gappy POD for sensors
data, with the recent DeepONet architecture. Numerical investigations for a
parametric benchmark function and a nonlinear parametric Navier-Stokes problem
are presented.
( 2
min )
Carefully standardized facial images of 591 participants were taken in the
laboratory, while controlling for self-presentation, facial expression, head
orientation, and image properties. They were presented to human raters and a
facial recognition algorithm: both humans (r=.21) and the algorithm (r=.22)
could predict participants' scores on a political orientation scale (Cronbach's
alpha=.94) decorrelated with age, gender, and ethnicity. These effects are on
par with how well job interviews predict job success, or alcohol drives
aggressiveness. Algorithm's predictive accuracy was even higher (r=.31) when it
leveraged information on participants' age, gender, and ethnicity. Moreover,
the associations between facial appearance and political orientation seem to
generalize beyond our sample: The predictive model derived from standardized
images (while controlling for age, gender, and ethnicity) could predict
political orientation (r=.13) from naturalistic images of 3,401 politicians
from the U.S., UK, and Canada. The analysis of facial features associated with
political orientation revealed that conservatives tended to have larger lower
faces. The predictability of political orientation from standardized images has
critical implications for privacy, the regulation of facial recognition
technology, and understanding the origins and consequences of political
orientation.
( 3
min )
The rapid mutation of the influenza virus threatens public health.
Reassortment among viruses with different hosts can lead to a fatal pandemic.
However, it is difficult to detect the original host of the virus during or
after an outbreak as influenza viruses can circulate between different species.
Therefore, early and rapid detection of the viral host would help reduce the
further spread of the virus. We use various machine learning models with
features derived from the position-specific scoring matrix (PSSM) and features
learned from word embedding and word encoding to infer the origin host of
viruses. The results show that the performance of the PSSM-based model reaches
the MCC around 95%, and the F1 around 96%. The MCC obtained using the model
with word embedding is around 96%, and the F1 is around 97%.
( 2
min )
Modern time series classifiers display impressive predictive capabilities,
yet their decision-making processes mostly remain black boxes to the user. At
the same time, model-agnostic explainers, such as the recently proposed SHAP,
promise to make the predictions of machine learning models interpretable,
provided there are well-designed domain mappings. We bring both worlds together
in our timeXplain framework, extending the reach of explainable artificial
intelligence to time series classification and value prediction. We present
novel domain mappings for the time domain, frequency domain, and time series
statistics and analyze their explicative power as well as their limits. We
employ a novel evaluation metric to experimentally compare timeXplain to
several model-specific explanation approaches for state-of-the-art time series
classifiers.
( 2
min )
In this paper, we explore the structure of the penultimate Gram matrix in
deep neural networks, which contains the pairwise inner products of outputs
corresponding to a batch of inputs. In several architectures it has been
observed that this Gram matrix becomes degenerate with depth at initialization,
which dramatically slows training. Normalization layers, such as batch or layer
normalization, play a pivotal role in preventing the rank collapse issue.
Despite promising advances, the existing theoretical results do not extend to
layer normalization, which is widely used in transformers, and can not
quantitatively characterize the role of non-linear activations. To bridge this
gap, we prove that layer normalization, in conjunction with activation layers,
biases the Gram matrix of a multilayer perceptron towards the identity matrix
at an exponential rate with depth at initialization. We quantify this rate
using the Hermite expansion of the activation function.
( 2
min )
Despite the recent advancements in offline reinforcement learning via
supervised learning (RvS) and the success of the decision transformer (DT)
architecture in various domains, DTs have fallen short in several challenging
benchmarks. The root cause of this underperformance lies in their inability to
seamlessly connect segments of suboptimal trajectories. To overcome this
limitation, we present a novel approach to enhance RvS methods by integrating
intermediate targets. We introduce the Waypoint Transformer (WT), using an
architecture that builds upon the DT framework and conditioned on
automatically-generated waypoints. The results show a significant increase in
the final return compared to existing RvS methods, with performance on par or
greater than existing state-of-the-art temporal difference learning-based
methods. Additionally, the performance and stability improvements are largest
in the most challenging environments and data configurations, including AntMaze
Large Play/Diverse and Kitchen Mixed/Partial.
( 2
min )
In this paper, we extend an available neural network verification technique
to support a wider class of piece-wise linear activation functions.
Furthermore, we extend the algorithms, which provide in their original form
exact respectively over-approximative results for bounded input sets
represented as start sets, to allow also unbounded input set. We implemented
our algorithms and demonstrated their effectiveness in some case studies.
( 2
min )
In today's rapidly evolving educational landscape, traditional modes of
passive information delivery are giving way to transformative pedagogical
approaches that prioritize active student engagement. Within the context of
large-scale hybrid classrooms, the challenge lies in fostering meaningful and
active interaction between students and course content. This study delves into
the significance of measuring students' earnestness during interactive lecture
participation exercises. By analyzing students' responses to interactive
lecture poll questions, establishing a clear rubric for evaluating earnestness,
and conducting a comprehensive assessment, we introduce EIT (Earnest Insight
Toolkit), a tool designed to assess students' engagement within interactive
lecture participation exercises - particularly in the context of large-scale
hybrid classrooms. Through the utilization of EIT, our objective is to equip
educators with valuable means of identifying at-risk students for enhancing
intervention and support strategies, as well as measuring students' levels of
engagement with course content.
( 2
min )
According to the literature, Product reviews are an important source of
information for customers to support their buying decision. Product reviews
improve customer trust and loyalty. Reviews help customers in understanding
what other customers think about a particular product and helps in driving
purchase decisions. Therefore, for an e-commerce platform it is important to
understand the sentiments in customer reviews to understand their products and
services, and it also allows them to potentially create positive consumer
interaction as well as long lasting relationships. Reviews also provide
innovative ways to market the products for an ecommerce company. One such
approach is Nudge Marketing. Nudge marketing is a subtle way for an ecommerce
company to help their customers make better decisions without hesitation.
( 2
min )
In sparse linear bandits, a learning agent sequentially selects an action and
receive reward feedback, and the reward function depends linearly on a few
coordinates of the covariates of the actions. This has applications in many
real-world sequential decision making problems. In this paper, we propose a
simple and computationally efficient sparse linear estimation method called
PopArt that enjoys a tighter $\ell_1$ recovery guarantee compared to Lasso
(Tibshirani, 1996) in many problems. Our bound naturally motivates an
experimental design criterion that is convex and thus computationally efficient
to solve. Based on our novel estimator and design criterion, we derive sparse
linear bandit algorithms that enjoy improved regret upper bounds upon the state
of the art (Hao et al., 2020), especially w.r.t. the geometry of the given
action set. Finally, we prove a matching lower bound for sparse linear bandits
in the data-poor regime, which closes the gap between upper and lower bounds in
prior work.
( 2
min )
The infinitely wide neural network has been proven a useful and manageable
mathematical model that enables the understanding of many phenomena appearing
in deep learning. One example is the convergence of random deep networks to
Gaussian processes that allows a rigorous analysis of the way the choice of
activation function and network weights impacts the training dynamics. In this
paper, we extend the seminal proof of Matthews et al. (2018) to a larger class
of initial weight distributions (which we call PSEUDO-IID), including the
established cases of IID and orthogonal weights, as well as the emerging
low-rank and structured sparse settings celebrated for their computational
speed-up benefits. We show that fully-connected and convolutional networks
initialized with PSEUDO-IID distributions are all effectively equivalent up to
their variance. Using our results, one can identify the Edge-of-Chaos for a
broader class of neural networks and tune them at criticality in order to
enhance their training.
( 2
min )
In this work, we investigate the problem of public data-assisted
non-interactive LDP (Local Differential Privacy) learning with a focus on
non-parametric classification. Under the posterior drift assumption, we for the
first time derive the mini-max optimal convergence rate with LDP constraint.
Then, we present a novel approach, the locally private classification tree,
which attains the mini-max optimal convergence rate. Furthermore, we design a
data-driven pruning procedure that avoids parameter tuning and produces a fast
converging estimator. Comprehensive experiments conducted on synthetic and real
datasets show the superior performance of our proposed method. Both our
theoretical and experimental findings demonstrate the effectiveness of public
data compared to private data, which leads to practical suggestions for
prioritizing non-private data collection.
( 2
min )
We study the mean field Langevin dynamics and the associated particle system.
By assuming the functional convexity of the energy, we obtain the
$L^p$-convergence of the marginal distributions towards the unique invariant
measure for the mean field dynamics. Furthermore, we prove the uniform-in-time
propagation of chaos in both the $L^2$-Wasserstein metric and relative entropy.
( 2
min )
Modern time series classifiers display impressive predictive capabilities,
yet their decision-making processes mostly remain black boxes to the user. At
the same time, model-agnostic explainers, such as the recently proposed SHAP,
promise to make the predictions of machine learning models interpretable,
provided there are well-designed domain mappings. We bring both worlds together
in our timeXplain framework, extending the reach of explainable artificial
intelligence to time series classification and value prediction. We present
novel domain mappings for the time domain, frequency domain, and time series
statistics and analyze their explicative power as well as their limits. We
employ a novel evaluation metric to experimentally compare timeXplain to
several model-specific explanation approaches for state-of-the-art time series
classifiers.
( 2
min )
In this paper, we introduce a novel and computationally efficient method for
vertex embedding, community detection, and community size determination. Our
approach leverages a normalized one-hot graph encoder and a rank-based cluster
size measure. Through extensive simulations, we demonstrate the excellent
numerical performance of our proposed graph encoder ensemble algorithm.
( 2
min )
In domains where sample sizes are limited, efficient learning algorithms are
critical. Learning using privileged information (LuPI) offers increased sample
efficiency by allowing prediction models access to auxiliary information at
training time which is unavailable when the models are used. In recent work, it
was shown that for prediction in linear-Gaussian dynamical systems, a LuPI
learner with access to intermediate time series data is never worse and often
better in expectation than any unbiased classical learner. We provide new
insights into this analysis and generalize it to nonlinear prediction tasks in
latent dynamical systems, extending theoretical guarantees to the case where
the map connecting latent variables and observations is known up to a linear
transform. In addition, we propose algorithms based on random features and
representation learning for the case when this map is unknown. A suite of
empirical results confirm theoretical findings and show the potential of using
privileged time-series information in nonlinear prediction.
( 2
min )
The ability to construct a realistic simulator of financial exchanges,
including reproducing the dynamics of the limit order book, can give insight
into many counterfactual scenarios, such as a flash crash, a margin call, or
changes in macroeconomic outlook. In recent years, agent-based models have been
developed that reproduce many features of an exchange, as summarised by a set
of stylised facts and statistics. However, the ability to calibrate simulators
to a specific period of trading remains an open challenge. In this work, we
develop a novel approach to the calibration of market simulators by leveraging
recent advances in deep learning, specifically using neural density estimators
and embedding networks. We demonstrate that our approach is able to correctly
identify high probability parameter sets, both when applied to synthetic and
historical data, and without reliance on manually selected or weighted
ensembles of stylised facts.
( 2
min )
We consider the problem of linear estimation, and establish an extension of
the Gauss-Markov theorem, in which the bias operator is allowed to be non-zero
but bounded with respect to a matrix norm of Schatten type. We derive simple
and explicit formulas for the optimal estimator in the cases of Nuclear and
Spectral norms (with the Frobenius case recovering ridge regression).
Additionally, we analytically derive the generalization error in multiple
random matrix ensembles, and compare with Ridge regression. Finally, we conduct
an extensive simulation study, in which we show that the cross-validated
Nuclear and Spectral regressors can outperform Ridge in several circumstances.
( 2
min )
A new exploratory technique called biarchetype analysis is defined. We extend
archetype analysis to find the archetypes of both observations and features
simultaneously. The idea of this new unsupervised machine learning tool is to
represent observations and features by instances of pure types (biarchetypes)
that can be easily interpreted as they are mixtures of observations and
features. Furthermore, the observations and features are expressed as mixtures
of the biarchetypes, which also helps understand the structure of the data. We
propose an algorithm to solve biarchetype analysis. We show that biarchetype
analysis offers advantages over biclustering, especially in terms of
interpretability. This is because byarchetypes are extreme instances as opposed
to the centroids returned by biclustering, which favors human understanding.
Biarchetype analysis is applied to several machine learning problems to
illustrate its usefulness.
( 2
min )
A multitude of (dis)similarity measures between neural network
representations have been proposed, resulting in a fragmented research
landscape. Most of these measures fall into one of two categories.
First, measures such as linear regression, canonical correlations analysis
(CCA), and shape distances, all learn explicit mappings between neural units to
quantify similarity while accounting for expected invariances. Second, measures
such as representational similarity analysis (RSA), centered kernel alignment
(CKA), and normalized Bures similarity (NBS) all quantify similarity in summary
statistics, such as stimulus-by-stimulus kernel matrices, which are already
invariant to expected symmetries. Here, we take steps towards unifying these
two broad categories of methods by observing that the cosine of the Riemannian
shape distance (from category 1) is equal to NBS (from category 2). We explore
how this connection leads to new interpretations of shape distances and NBS,
and draw contrasts of these measures with CKA, a popular similarity measure in
the deep learning literature.
( 2
min )
A way to counter AI risks could be to create AI risks. The question is by who, a non-profit, a corporation, a nation, or a treaty? It may take extremes in systems, across tasks, to find out the depths of threats. If AI is used in weaponry, what are all the possible ways, such that… Read More »Why generative AI safety research is beyond alignment
The post Why generative AI safety research is beyond alignment appeared first on Data Science Central.
( 21
min )
In the dynamic world of streaming on Amazon Music, every search for a song, podcast, or playlist holds a story, a mood, or a flood of emotions waiting to be unveiled. These searches serve as a gateway to new discoveries, cherished experiences, and lasting memories. The search bar is not just about finding a song; […]
( 10
min )
This post is written in collaboration with Brad Duncan, Rachel Johnson and Richard Alcock from MathWorks. MATLAB is a popular programming tool for a wide range of applications, such as data processing, parallel computing, automation, simulation, machine learning, and artificial intelligence. It’s heavily used in many industries such as automotive, aerospace, communication, and manufacturing. In […]
( 10
min )
In this post, we demonstrate how to use the SageMaker Python SDK for text embedding and sentence similarity. Sentence similarity involves assessing the likeness between two pieces of text after they are converted into embeddings by the LLM, which is a foundation step for applications like Retrieval Augmented Generation (RAG).
( 10
min )
Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from any document or image. AnalyzeDocument Layout is a new feature that allows customers to automatically extract layout elements such as paragraphs, titles, subtitles, headers, footers, and more from documents. Layout extends Amazon Textract’s word and line detection by automatically […]
( 14
min )
Twelve teams of students and postdocs across the MIT community presented innovative startup ideas with potential for real-world impact.
( 11
min )
Genentech, a member of the Roche Group, is pioneering the use of generative AI to discover and develop new therapeutics and deliver treatments to patients more efficiently. A new collaboration between Genentech, the biotechnology pioneer, and NVIDIA aims to transform the discovery and development of new medicines by bringing together experts from each company to Read article >
( 6
min )
It’s the season of gratitude: that time of year to give thanks for the people and small moments that make life so special.
( 7
min )
So lately I've been getting a kick out of asking DALL-E3 for images labeled with text. They're just good enough to be legible, but yet:
The food that gets duplicated seems to vary from spread to spread.
I also asked DALL-E 3 to do the dessert
( 4
min )
AI Weirdness: the strange side of machine learning
( 2
min )
At Microsoft, we’re expanding AI capabilities by training small language models to achieve the kind of enhanced reasoning and comprehension typically found only in much larger models.
The post Orca 2: Teaching Small Language Models How to Reason appeared first on Microsoft Research.
( 10
min )
This work presents an analysis of the effectiveness of using standard shallow
feed-forward networks to mimic the behavior of the attention mechanism in the
original Transformer model, a state-of-the-art architecture for
sequence-to-sequence tasks. We substitute key elements of the attention
mechanism in the Transformer with simple feed-forward networks, trained using
the original components via knowledge distillation. Our experiments, conducted
on the IWSLT2017 dataset, reveal the capacity of these "attentionless
Transformers" to rival the performance of the original architecture. Through
rigorous ablation studies, and experimenting with various replacement network
types and sizes, we offer insights that support the viability of our approach.
This not only sheds light on the adaptability of shallow feed-forward networks
in emulating attention mechanisms but also underscores their potential to
streamline complex architectures for sequence-to-sequence tasks.
( 2
min )
Federated Learning (FL) enables collaborative machine learning model training
across multiple parties without sharing raw data. However, FL's distributed
nature allows malicious clients to impact model training through Byzantine or
backdoor attacks, using erroneous model updates. Existing defenses measure the
deviation of each update from a 'ground-truth model update.' They often rely on
a benign root dataset on the server or use trimmed mean or median for clipping,
both methods having limitations.
We introduce FedTruth, a robust defense against model poisoning in FL.
FedTruth doesn't assume specific data distributions nor requires a benign root
dataset. It estimates a global model update with dynamic aggregation weights,
considering contributions from all benign clients. Empirical studies
demonstrate FedTruth's efficacy in mitigating the impacts of poisoned updates
from both Byzantine and backdoor attacks.
( 2
min )
Decades of research indicate that emotion recognition is more effective when
drawing information from multiple modalities. But what if some modalities are
sometimes missing? To address this problem, we propose a novel
Transformer-based architecture for recognizing valence and arousal in a
time-continuous manner even with missing input modalities. We use a coupling of
cross-attention and self-attention mechanisms to emphasize relationships
between modalities during time and enhance the learning process on weak salient
inputs. Experimental results on the Ulm-TSST dataset show that our model
exhibits an improvement of the concordance correlation coefficient evaluation
of 37% when predicting arousal values and 30% when predicting valence values,
compared to a late-fusion baseline approach.
( 2
min )
Online High Definition Map (HDMap) estimation from sensors offers a low-cost
alternative to manually acquired HDMaps. As such, it promises to lighten costs
for already HDMap-reliant Autonomous Driving systems, and potentially even
spread their use to new systems. In this paper, we propose to improve online
HDMap estimation by accounting for already existing maps. We identify 3
reasonable types of useful existing maps (minimalist, noisy, and outdated). We
also introduce MapEX, a novel online HDMap estimation framework that accounts
for existing maps. MapEX achieves this by encoding map elements into query
tokens and by refining the matching algorithm used to train classic query based
map estimation models. We demonstrate that MapEX brings significant
improvements on the nuScenes dataset. For instance, MapEX - given noisy maps -
improves by 38% over the MapTRv2 detector it is based on and by 16% over the
current SOTA.
( 2
min )
Despite the widespread use and success of machine-learning techniques for
detecting phase transitions from data, their working principle and fundamental
limits remain elusive. Here, we explain the inner workings and identify
potential failure modes of these techniques by rooting popular machine-learning
indicators of phase transitions in information-theoretic concepts. Using tools
from information geometry, we prove that several machine-learning indicators of
phase transitions approximate the square root of the system's (quantum) Fisher
information from below -- a quantity that is known to indicate phase
transitions but is often difficult to compute from data. We numerically
demonstrate the quality of these bounds for phase transitions in classical and
quantum systems.
( 2
min )
Graph neural networks have been successful for machine learning, as well as
for combinatorial and graph problems such as the Subgraph Isomorphism Problem
and the Traveling Salesman Problem. We describe an approach for computing graph
sparsifiers by combining a graph neural network and Monte Carlo Tree Search. We
first train a graph neural network that takes as input a partial solution and
proposes a new node to be added as output. This neural network is then used in
a Monte Carlo search to compute a sparsifier. The proposed method consistently
outperforms several standard approximation algorithms on different types of
graphs and often finds the optimal solution.
( 2
min )
Tabular classification has traditionally relied on supervised algorithms,
which estimate the parameters of a prediction model using its training data.
Recently, Prior-Data Fitted Networks (PFNs) such as TabPFN have successfully
learned to classify tabular data in-context: the model parameters are designed
to classify new samples based on labelled training samples given after the
model training. While such models show great promise, their applicability to
real-world data remains limited due to the computational scale needed. Here we
study the following question: given a pre-trained PFN for tabular data, what is
the best way to summarize the labelled training samples before feeding them to
the model? We conduct an initial investigation of sketching and
feature-selection methods for TabPFN, and note certain key differences between
it and conventionally fitted tabular models.
( 2
min )
Uncovering the mechanisms behind long-term memory is one of the most
fascinating open problems in neuroscience and artificial intelligence.
Artificial associative memory networks have been used to formalize important
aspects of biological memory. Generative diffusion models are a type of
generative machine learning techniques that have shown great performance in
many tasks. Like associative memory systems, these networks define a dynamical
system that converges to a set of target states. In this work we show that
generative diffusion models can be interpreted as energy-based models and that,
when trained on discrete patterns, their energy function is (asymptotically)
identical to that of modern Hopfield networks. This equivalence allows us to
interpret the supervised training of diffusion models as a synaptic learning
process that encodes the associative dynamics of a modern Hopfield network in
the weight structure of a deep neural network. Leveraging this connection, we
formulate a generalized framework for understanding the formation of long-term
memory, where creative generation and memory recall can be seen as parts of a
unified continuum.
( 2
min )
We present Emu Video, a text-to-video generation model that factorizes the
generation into two steps: first generating an image conditioned on the text,
and then generating a video conditioned on the text and the generated image. We
identify critical design decisions--adjusted noise schedules for diffusion, and
multi-stage training--that enable us to directly generate high quality and high
resolution videos, without requiring a deep cascade of models as in prior work.
In human evaluations, our generated videos are strongly preferred in quality
compared to all prior work--81% vs. Google's Imagen Video, 90% vs. Nvidia's
PYOCO, and 96% vs. Meta's Make-A-Video. Our model outperforms commercial
solutions such as RunwayML's Gen2 and Pika Labs. Finally, our factorizing
approach naturally lends itself to animating images based on a user's text
prompt, where our generations are preferred 96% over prior work.
( 2
min )
Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima
and has been shown to enhance generalization performance in various settings.
In this work we show that perturbing only the affine normalization parameters
(typically comprising 0.1% of the total parameters) in the adversarial step of
SAM can outperform perturbing all of the parameters.This finding generalizes to
different SAM variants and both ResNet (Batch Normalization) and Vision
Transformer (Layer Normalization) architectures. We consider alternative sparse
perturbation approaches and find that these do not achieve similar performance
enhancement at such extreme sparsity levels, showing that this behaviour is
unique to the normalization layers. Although our findings reaffirm the
effectiveness of SAM in improving generalization performance, they cast doubt
on whether this is solely caused by reduced sharpness.
( 2
min )
Pull Requests (PRs) that are neither progressed nor resolved clutter the list
of PRs, making it difficult for the maintainers to manage and prioritize
unresolved PRs. To automatically track, follow up, and close such inactive PRs,
Stale bot was introduced by GitHub. Despite its increasing adoption, there are
ongoing debates on whether using Stale bot alleviates or exacerbates the
problem of inactive PRs. To better understand if and how Stale bot helps
projects in their pull-based development workflow, we perform an empirical
study of 20 large and popular open-source projects. We find that Stale bot can
help deal with a backlog of unresolved PRs as the projects closed more PRs
within the first few months of adoption. Moreover, Stale bot can help improve
the efficiency of the PR review process as the projects reviewed PRs that ended
up merged and resolved PRs that ended up closed faster after the adoption.
However, Stale bot can also negatively affect the contributors as the projects
experienced a considerable decrease in their number of active contributors
after the adoption. Therefore, relying solely on Stale bot to deal with
inactive PRs may lead to decreased community engagement and an increased
probability of contributor abandonment.
( 3
min )
Automated creation of synthetic traffic scenarios is a key part of validating
the safety of autonomous vehicles (AVs). In this paper, we propose Scenario
Diffusion, a novel diffusion-based architecture for generating traffic
scenarios that enables controllable scenario generation. We combine latent
diffusion, object detection and trajectory regression to generate distributions
of synthetic agent poses, orientations and trajectories simultaneously. To
provide additional control over the generated scenario, this distribution is
conditioned on a map and sets of tokens describing the desired scenario. We
show that our approach has sufficient expressive capacity to model diverse
traffic patterns and generalizes to different geographical regions.
( 2
min )
Semiparametric efficient estimation of various multi-valued causal effects,
including quantile treatment effects, is important in economic, biomedical, and
other social sciences. Under the unconfoundedness condition, adjustment for
confounders requires estimating the nuisance functions relating outcome or
treatment to confounders nonparametrically. This paper considers a generalized
optimization framework for efficient estimation of general treatment effects
using artificial neural networks (ANNs) to approximate the unknown nuisance
function of growing-dimensional confounders. We establish a new approximation
error bound for the ANNs to the nuisance function belonging to a mixed
smoothness class without a known sparsity structure. We show that the ANNs can
alleviate the "curse of dimensionality" under this circumstance. We establish
the root-$n$ consistency and asymptotic normality of the proposed general
treatment effects estimators, and apply a weighted bootstrap procedure for
conducting inference. The proposed methods are illustrated via simulation
studies and a real data application.
( 2
min )
From classical HPC to deep learning, MatMul is at the heart of today's
computing. The recent Maddness method approximates MatMul without the need for
multiplication by using a hash-based version of product quantization (PQ)
indexing into a look-up table (LUT). Stella Nera is the first Maddness
accelerator and it achieves 15x higher area efficiency (GMAC/s/mm^2) and more
than 25x higher energy efficiency (TMAC/s/W) than direct MatMul accelerators
implemented in the same technology. The hash function is a decision tree, which
allows for an efficient hardware implementation as the multiply-accumulate
operations are replaced by decision tree passes and LUT lookups. The entire
Maddness MatMul can be broken down into parts that allow an effective
implementation with small computing units and memories, allowing it to reach
extreme efficiency while remaining generically applicable for MatMul tasks. In
a commercial 14nm technology and scaled to 3nm, we achieve an energy efficiency
of 161 TOp/s/W@0.55V with a Top-1 accuracy on CIFAR-10 of more than 92.5% using
ResNet9.
( 2
min )
Despite the widespread use and success of machine-learning techniques for
detecting phase transitions from data, their working principle and fundamental
limits remain elusive. Here, we explain the inner workings and identify
potential failure modes of these techniques by rooting popular machine-learning
indicators of phase transitions in information-theoretic concepts. Using tools
from information geometry, we prove that several machine-learning indicators of
phase transitions approximate the square root of the system's (quantum) Fisher
information from below -- a quantity that is known to indicate phase
transitions but is often difficult to compute from data. We numerically
demonstrate the quality of these bounds for phase transitions in classical and
quantum systems.
( 2
min )
Decades of research indicate that emotion recognition is more effective when
drawing information from multiple modalities. But what if some modalities are
sometimes missing? To address this problem, we propose a novel
Transformer-based architecture for recognizing valence and arousal in a
time-continuous manner even with missing input modalities. We use a coupling of
cross-attention and self-attention mechanisms to emphasize relationships
between modalities during time and enhance the learning process on weak salient
inputs. Experimental results on the Ulm-TSST dataset show that our model
exhibits an improvement of the concordance correlation coefficient evaluation
of 37% when predicting arousal values and 30% when predicting valence values,
compared to a late-fusion baseline approach.
( 2
min )
The evolution of data management has kept pace with the rapid increase in data generation, and after beginning with straightforward relational databases and ETL, big data and unstructured data paved the way for the development of automated data pipelines and lakes. But this data cascade appears to have no stop in sight. Contemporary data surpasses… Read More »From Confusion to Clarity: How AI Simplifies Data Management for Enterprises
The post From Confusion to Clarity: How AI Simplifies Data Management for Enterprises appeared first on Data Science Central.
( 21
min )
Last night’s changes in AI have been seismic post the shock resignation of Sam Altman It is still early days and these changes will be played out Undoubtedly, this change will impact AI roadmaps worldwide So, how should you re-calibrate your AI road map post Sam Altman leaving OpenAI? Has anything really changed? I was… Read More »Should you recalibrate your AI roadmap post changes in OpenAI ?
The post Should you recalibrate your AI roadmap post changes in OpenAI ? appeared first on Data Science Central.
( 20
min )
Retrieval Augmented Generation (RAG) allows you to provide a large language model (LLM) with access to data from external knowledge sources such as repositories, databases, and APIs without the need to fine-tune it. When using generative AI for question answering, RAG enables LLMs to answer questions with the most relevant, up-to-date information and optionally cite […]
( 14
min )
KT Corporation is one of the largest telecommunications providers in South Korea, offering a wide range of services including fixed-line telephone, mobile communication, and internet, and AI services. KT’s AI Food Tag is an AI-based dietary management solution that identifies the type and nutritional content of food in photos using a computer vision model. This […]
( 11
min )
Lifelong model editing fixes mistakes discovered after model deployment. This work could expand sequential editing to model properties like fairness and privacy and enable a new class of solutions for adapting LLMs over long deployment lifetimes.
The post Lifelong model editing in large language models: Balancing low-cost targeted edits and catastrophic forgetting appeared first on Microsoft Research.
( 12
min )
MIT CSAIL researchers innovate with synthetic imagery to train AI, paving the way for more efficient and bias-reduced machine learning.
( 9
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )